a UMR Marbec,CNRS,IRD, Ifremer, Université de Montpellier, Batiment 24, Campus Triolet, Montpellier,34090 France
1To whom correspondence should be addressed. E-mail: mccheutin@yahoo.es
Abstract
Workflow to generate phyloseq objects, analyze data, and create figures for the influence of the coral-macroalgal shift in Seychelles on microbiomes of seaweeds and gut if reef fish. Here we report both methods and results in plain text and R code. You can choose to show all code or hide the code using the button in the upper right hand corner of the front page (default is hide) as well as download the .Rmd file to tweak the code.knitr::opts_chunk$set(eval = FALSE)
remove(list = ls())
cran_packages <- c("knitr", "phyloseqGraphTest", "phyloseq", "shiny", "microbiome",
"tidyverse", "miniUI", "caret", "pls", "e1071", "ggplot2",
"randomForest","entropart", "vegan", "plyr", "dplyr", "here",
"ggrepel", "nlme", "R.utils", "gridExtra","grid", "googledrive",
"googlesheets", "phangorn", "devtools", "rmarkdown", "sys",
"reshape2", "devtools", "PMA","structSSI","ade4", "ape",
"Biostrings", "igraph", "ggnetwork", "intergraph", "ips",
"scales", "kableExtra", "pgirmess", "treemap", "knitr","kableExtra",
"rstudioapi" ,"data.table","DT","pander","formatR","grDevices","svgPanZoom",
"RCurl","plotly","pairwiseAdonis", "stringr")
github_packages <- c("jfukuyama/phyloseqGraphTest")
bioc_packages <- c("phyloseq", "genefilter", "impute", "dada2", "DECIPHER")
# Install CRAN packages (if not already installed)
#Some packages would be not availbale for your R version
inst <- cran_packages %in% installed.packages()
if (any(! inst)) {
install.packages(cran_packages[!inst], repos = "http://cran.rstudio.com/") }
#
inst <- github_packages %in% installed.packages()
if (any(! inst)) {
devtools::install_github(github_packages[!inst]) }
# Load libraries
sapply(c(cran_packages, bioc_packages), require, character.only = TRUE)
sessionInfo()
set.seed(1000)This document is interactive. You can sort and scroll through most of the tables and the phylogenetic tree is zoomable. In the upper right hand corner of the front page is a Code button. Use this to show or hide all the code in the document (default is hide) as well as download the .Rmd file which you can use to extract the code. Data products from the final paper are highlighted in Red.
\[ID_x = \frac{Variance_x}{\sum_{Abondance_x}}\]
where \(ID_x\) is the index of dispersion of the \(ASV_x\). Each calculated index was compared to a Poisson distribution (Magurran & Henderson, 2003), and tested whether they were included between the 2.5% and 97,5% confidence interval of the Khi2 distribution (Krebs, 1999).
The obtained reads from the sequencing platform (Genotoul, Toulouse) were demultiplexed and linkers with primers sequences were removed before being processed through DADA2 pipeline (Callahan et al., 2016). Briefly, sequences were trimmed and filtered for quality, dereplicated and inferred in ASVs (Glassman & Martiny, 2018). Forward and reverse ASVs were merged and finally generated a count table where chimera were identified and removed. The taxonomy assignment was performed using the SILVA reference database (release 123) (Quast et al., 2013). A phyloseq object (McMurdie & Holmes, 2013) was generated for further statistical analysis.
To minimize the effect of contamination during the dissection, extraction and amplification steps, the ASVs taxonomically assigned to Eukarya Kingdom (< 0.1%), the organelles reads (Chloroplast and Mitochondria ) and to 40 kit contaminant genus described in Salter et al. (2014) were removed from the database. Because of the high variability of sequencing depth, tables of ASVs were normalized based on the lowest number of sequences for each compartment (gut, macroalgae and turf). Thus, this part create several phyloseq object in order to use for the alpha and beta diversity.
Before analyzing the microbial compositions, we assess the host distribution between reefs. The macroalga shift has already been study to alter fish communities and trophic structure (Burkepile & Hay, (2008) ; Nyström et al., (2008)). We first analyze our sampling results in order to underpinned the previous observations and to understand how microbiome is adapted to the health reef status.
We first describe the core microbiome as microbiome constituted by theoritical permanent ASVs. I a first step, enteric and macro-algal core bacteriomes were independently identified by examining the species abundance distribution (SAD) patterns of each ASV and by partitioning the SAD into core and satellite ASVs (Magurran & Henderson, 2003). For this purpose, the index of dispersion for each ASV was calculated as the ratio of the variance to the mean abundance (VMR) multiplied by the occurrence. \[ID_x = \frac{Variance_x}{\sum_{Abondance_x}}\]
where \(ID_x\) is the index of dispersion of the \(ASV_x\). This index was used to model whether lineages follow a Poisson distribution (i.e., stochastic distribution), falling between the 2.5 and 97.5% confidence interval of the χ2 distribution (Krebs, 1999). Concerning the alpha diversity, tables were normalized with the abundance of the minimal sampling size. Richness observed and Shannon index were calculated and compared between both compartment. Then, the beta diversity was calculated with the normalized tables for both cores. A blast on the different ASVs was also made and ploted through Itol in order to determine the origin of the ASVs in the enteric and algal cores. Also, to detect the taxa descriptor of each compartment, we proceeded to a Linear Discriminant analyses on LefSe to detect which ASVs are biomarkers of each compartments (Segata et al., 2011).
In this part, we treated the main issue comparing the environmental samples and enteric core microbiomes from HRs and IRs with alpha diversity and beta diversity with PERMANOVA. Concerning the enteric microbiome, our sampling size permited to test four species (Aprion virescens, Scarus ghobban, Lethrinus mahsena and Lethrinus enigmaticus), two genus (Lethrinus sp., Scarus sp.) and one family (Lutjanidae).
Finally, in order to determine which determinants are the drivers of the microbial composition of the enteric microbiome of the reef fish, we tested the effect of the phylogeny and the diet by comparing the herbivore and the invertivores. As for the Part IV, we generated a LDA in order to detect the microbial taxa biomarkers for each diet.
Ps: All tables and figures presented below were possibly modify before to be insert in the publication. We also include many additional data productes that were not part of the original publication.
First, you need to load the original R1 R2 reads in the /dir_fastq_source folder. Then, download the SILVA database references into the /dir_ref_db folder. Concerning the primers 515F-Y & 926R used for this publication, their sequences are written right here. You can also write a .txt files for each primer you used and load it in the /dir_primers folder to call them when you need it.
primer_515FY <- "GTGYCAGCMGCCGCGGTAA"
primer_926R <- "CCGYCAATTYMTTTRAGTTT"We need to read the folders which contain our sequences,primers sequences, environmental dataset and references.
library(stringr)
nms_seq_runs <- list.files(dir_fastq_source)
paths_seq_runs <- list.files(dir_fastq_source, full.names = TRUE) %>% setNames(nms_seq_runs)
nms_refdb <- list.files(dir_refdb)
paths_refdb <- list.files(dir_refdb, full.names = TRUE) %>% setNames(nms_refdb)
nms_primers <- list.files(dir_primers)
paths_primers <- list.files(dir_primers, full.names = TRUE) %>% setNames(nms_primers)You have R1 and R2 reads with the corresponding sequencing names which can be very ugly. First, get the list of your samples and exctract the sample names:
fns <- sort(list.files(dir_sample_selection, full.names = TRUE))
fns <- fns[str_detect(basename(fns), ".fastq")]
fns_R1 <- fns[str_detect(basename(fns), "R1")]
fns_R2 <- fns[str_detect(basename(fns), "R2")]
if(length(fns_R1) != length(fns_R2)) stop("Forward and reverse files do not match.")
sample_names <- sapply(strsplit(basename(fns_R1), "_"), `[`, 1)
sample_namesOnce sample names cut, you can remove the primer sequence from your data by “counting” the number of nucleotides of each primer. Because nucleotides maybe me flowing, it not advised to remove the sequence of the primer by its nucleotide composition.
library(readr)
primer_set_fwd <- read_lines(paste0(dir_primers, "primer_", "515F-Y" , ".txt"))
primer_set_rev <- read_lines(paste0(dir_primers, "primer_", "926R", ".txt"))
primer_length_fwd <- str_length(primer_set_fwd[2])
primer_length_rev <- str_length(primer_set_rev[1])
# You can also use the sequence of the {r primers description}Now you can plot the quality profiles of your reads (which is usely better on the R1) in order to truncate the sequences before quality drastically decrease. Then, you can filter your sequences and trim the primer length.
dir_quality_plots <- paste0(dir_seq_processing, "/quality_pdf/plots/")
dir_create(dir_quality_plots , recursive = T)
qual_R1 <- plotQualityProfile(fns_R1[1])
qual_R2 <- plotQualityProfile(fns_R2[1])
ggsave(qual_R1, file = paste0(dir_quality_plots , "quality_profile_R1.png"))
ggsave(qual_R2, file = paste0(dir_quality_plots, "quality_profile_R2.png"))
filt_R1 <- str_c(dir_filtered, sample_names, "_R1_filt.fastq")
filt_R2 <- str_c(dir_filtered, sample_names, "_R2_filt.fastq")
names(filt_R1) <- sample_names
names(filt_R2) <- sample_names
set.seed(1000)
out <- filterAndTrim(fns_R1, filt_R1, fns_R2, filt_R2, truncLen=c(240,240),
maxN=0, maxEE=c(2,2), truncQ=2, trimLeft=c(primer_length_fwd,primer_length_rev) , rm.phix=TRUE,
compress=TRUE, multithread=TRUE)
head(out)
sample_names <- sapply(strsplit(basename(filt_R1), "_"), `[`, 1) # Assumes filename = samplename_XXX.fastq.gz
sample_namesR <- sapply(strsplit(basename(filt_R2), "_"), `[`, 1) # Assumes filename = samplename_XXX.fastq.gz
if(!identical(sample_names, sample_namesR)) stop("Forward and reverse files do not match.")
names(filt_R1) <- sample_names
names(filt_R2) <- sample_names
set.seed(1000)Finally, you can learn the error rates for both reads and save them.
errF <- learnErrors(filt_R1,nbases=1e8, multithread=TRUE)
errR <- learnErrors(filt_R2, nbases=1e8, multithread=TRUE)
plotErrors_F <- plotErrors(errF, nominalQ=TRUE)
plotErrors_R <- plotErrors(errR, nominalQ=TRUE)
ggsave(plotErrors_F, file = paste0(dir_quality_plots , "plotErrors_F.png"))
ggsave(plotErrors_R, file = paste0(dir_quality_plots, "plotErrors_R.png"))Ps: If you have big data, please proceed directly at the Part Ib: DADA2 process for BIG DATA.
Now, we dereplicate the filtered reads in order to infer them in dada file to merge.
#Dereplication step
derepFs <- derepFastq(filt_R1, verbose=TRUE)
derepRs <- derepFastq(filt_R2, verbose=TRUE)
names(derepFs) <- sample_names
names(derepRs) <- sample_names
# Inference step
dadaFs <- dada(derepFs, err=errF, multithread=TRUE, pool = T)
dadaRs <- dada(derepRs, err=errR, multithread=TRUE, pool = T)
dadaFs[[1]]
dadaRs[[1]]
# Merging step
mergers <- mergePairs(dadaFs, derepFs, dadaRs, derepRs, verbose=TRUE)
head(mergers[[1]])Now, we can construct our sequence table “seqtab” and remove chimeras. It is recommandable to track the reads through the pipeline process
seqtab <- makeSequenceTable(mergers)
row.names(seqtab) = sample_names
dim(seqtab)
table(nchar(getSequences(seqtab)))
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE)
dim(seqtab.nochim)
sum(seqtab.nochim)/sum(seqtab)
saveRDS(seqtab.nochim, paste0(dir_seq_processing,"seqtab.nochim.rds"))
# ____ Track reads through the pipeline ----
getN <- function(x) sum(getUniques(x))
track <- cbind(out, sapply(dadaFs, getN), sapply(mergers, getN), rowSums(seqtab), rowSums(seqtab.nochim))
colnames(track) <- c("input", "filtered", "denoised", "merged", "tabled", "nonchim")
rownames(track) <- sample_names
head(track)
save(track, file = "track_515F-Y_926R.RData")These chunks are highly recommendable if you have a huge mount of sequences !
mergers <- vector("list", length(sample_names))
names(mergers) <- sample_names
for(sam in sample_names) {
cat("Processing:", sam, "\n")
derepFs <- derepFastq(filt_R1[[sam]])
ddF <- dada(derepFs, err=errF, multithread=TRUE)
derepRs <- derepFastq(filt_R2[[sam]])
ddR <- dada(derepRs, err=errR, multithread=TRUE)
merger <- mergePairs(ddF, derepFs, ddR, derepRs)
mergers[[sam]] <- merger
}
rm(derepFs); rm(derepRs)Now the construction of the sequence table
seqtab <- makeSequenceTable(mergers)
row.names(seqtab) = sample_names
dim(seqtab)
table(nchar(getSequences(seqtab)))
saveRDS(seqtab, paste0(dir_seq_processing,"seqtab.rds"))
# If processing a single sample, remove the sapply calls: e.g. replace sapply(dadaFs, getN) with getN(dadaFs)
colnames(track) <- c("input", "filtered", "tabled")
rownames(track) <- sample_names
head(track)
save(track, file = "table_track_A515F-Y_926R.RData")And remove chimeras as the other part :
seqtab.nochim <- removeBimeraDenovo(seqtab, method="consensus", multithread=TRUE, verbose=TRUE)
dim(seqtab.nochim)
sum(seqtab.nochim)/sum(seqtab)
saveRDS(seqtab.nochim, paste0(dir_seq_processing,"seqtab.nochim.rds"))You have already load the SILVA ref files, then you have to assign from them the sequence tables
load(dir_seq_processing , "seqtab.nochim.rds")
path_reference_db <- paste0(dir_refdb, "silva_nr_v132_train_set.fa.gz")
path_species_db <- paste0(dir_refdb, "silva_species_assignment_v132.fa.gz")
taxaRC <- assignTaxonomy(seqtab.nochim, path_reference_db, tryRC=TRUE)
taxaSp <- addSpecies(taxaRC, path_species_db) # This step can be very long
taxa.print <- taxaSp # Removing sequence rownames for display only
rownames(taxa.print) <- NULL
head(taxa.print)
saveRDS(seqtab.nochim,taxaRC, taxaSp, file= paste0(dir_seq_processing,"dada2_files.rds"))Now you can create your phyloseq object with your filtered and assigned sequences. Before running the phyloseq process, you need to clean your worspace in order to free the r stack memory
And set_wd again
We can create our phyloseq object from the seqtab.nochim file created previously.
load(paste0(dir_taxa_assign,"dada2_files.rds"))
load(paste0(dir_taxa_assign,"seqtab.nochim_515F-Y.926R.RData"))
ps <- phyloseq(otu_table(seqtab.nochim, taxa_are_rows=FALSE),tax_table(taxaRC))
# You can use tax_table(taxaSp) if you need the assignment until species levelNow, we will modify our table with ASVs names
dna <- Biostrings::DNAStringSet(taxa_names(ps))
names(dna) <- taxa_names(ps)
ps <- merge_phyloseq(ps, dna)
taxa_names(ps) <- paste0("ASV", seq(ntaxa(ps)))
ps
save(ps, taxaRC,seqtab.nochim, seqtab, file=paste0(dir_taxa_assign ,"ps_515F-Y.926R.RData"))
# First results
ntaxa(ps)
nsamples(ps)
sample_names(ps)[1:5]
rank_names(ps)
otu_table(ps)[1:5, 1:5]
tax_table(ps)[1:5, 1:6]The envdata is constituted of the sampling data for the gut, algae and turf samples. * Tax_information : The informations concerning the species are written in different ways with tax1 as the Species genus, tax2 as the Species_genus nomenclature, tax3 as the SPG_x which is the corresponding two first letters of the species and the first of the genus followed by the x of the sampled individuals in this species. The type, lineage, Order, Family ,Genus and Species are mentioned.
Diet_information: The Diet is mentionned at high scale with the diet3 and at granulous scales following the ecological traits in Mouillot et al., (2014). The diet4 mentionned the difference in the herbivory for the Scaridae, Acanthuridae and Siganidae species. Also, the trophic position is mentionned with the FishBase reference.
Individual_information: For each individual, the weight mass, total length, gut mass and sex were obtained.
Site_information: Site with the GPS coordonates were taken and the geomorphology were attributed to the condition of the reef.
envdata <- read.csv2(paste0(dir_refdb, "env.csv"))
colnames(envdata)[1] = "ID"
datatable(envdata[-c(3,6,10, 12)], rownames = FALSE, width = "100%",
colnames = c("ID", "Species","Species number" ,"Compartment","Order","Family","Genus","Diet abreviation","Diet High scale","Diet Low scale","Trophic position", "Mass (g)","Total Length (cm)","Gut mass (g)","Sex","Site","Reef condition","GPS latitude","GPS longitude", "Island", "Substrat"),
caption = htmltools::tags$caption(style = "caption-side:
bottom; text-align: left;",
"Table: ",
htmltools::em("Sample presentation.")),
extensions = "Buttons",
options = list(columnDefs =
list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 50),
buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE))It’s just an parenthesis for ordering envdata depending of your phyloseq objects. It is a step a bit borring but necessary if your env file is not synchro with your phyloseq object, meaning that the samples names in your env file are not the same that your sample names in the phyloseq object. First, we will order the names
load(paste0(dir_taxa_assign, "dada2_files.RData"))
nochim_names <- rownames(seqtab.nochim)
env_names <- as.character(envdata$ID)Now, the aim is to have exactly the same names in the phyloseq object and in the env table.
lecture <- cbind(sort(nochim_names),sort(env_names))
identical(lecture[,1], lecture[,2]) If it’s True, you’re names are the same, directly pass to the merging step; If False, you have to replace the correct names. Also, you have to check your env file with the names of your fish species, genus or family which could be incorrect.
levels(factor(envdata$family))
levels(factor(envdata$species))Now that the seqtab and the envdata are corresponding, we will merge them into phyloseq object.
load(paste0(dir_taxa_assign,"ps_515F-Y.926R.RData"))
envdata <- read.csv2(paste0(dir_refdb, "env.csv"))
colnames(envdata)[1] = "ID"
DAT <- sample_data(envdata)
DAT
sort(sample_names(DAT)) == sort(sample_names(ps)) # must be true to be merged
ps1<- merge_phyloseq(ps, DAT)
ps1
save(ps1, envdata,file=paste0(dir_taxa_assign ,"ps1_515F-Y.926R.RData"))Plot the sample minimum ASV
min(rowSums(ps1@otu_table@.Data))
readsumsdf = data.frame(nreads = sort(taxa_sums(ps1),TRUE), sorted = 1:ntaxa(ps1), type = "ASVs")
readsumsdf = rbind(readsumsdf, data.frame(nreads = sort(sample_sums(ps1), TRUE), sorted = 1:nsamples(ps1), type = "Samples"))
title = "Total number of reads"
pdf(file = paste0(dir_quality_plots, "total_number_of_reads.pdf"), he = 7, wi = 7)
p = ggplot(readsumsdf, aes(x = sorted, y = nreads)) + geom_bar(stat = "identity")
p + ggtitle(title) + scale_y_log10() + facet_wrap(~type, 1, scales = "free")
dev.off()We will filter our ps1 to only keep the Prokaryotes
load(paste0(dir_taxa_assign, "ps1_515F-Y.926R.RData"))
ps_sey_proka <- subset_taxa(ps_sey , Kingdom %in% c("Archaea", "Bacteria"))
save(ps_sey_proka, file = paste0(dir_taxa_assign, "ps_sey_proka.RData"))Then, we will obtain the phylogenetic distances between ASVs The tree was obtained after aligning the sequences on mothur and was names “Sey.tree” and was load in the *_taxa_assign* folder. The assignment was made against the ARB database available on SILVA database reference (v132) and proceeded on Mothur platform.
sey_tree <- read.tree(paste(dir_taxa_assign, "Sey.tree"))
sey_tree2 <- root(sey_tree, "ASV40619", resolve.root = T)
sey_tree2 <- drop.tip(sey_tree2,"ASV40619" ) # To delete the outgroup ASV
library(picante)
cal1<-makeChronosCalib(sey_tree2, node = "root", age.min = 1, age.max = 1, interactive = FALSE, soft.bounds = FALSE) #calibration for ultrametric branchs.
sey_chronogramme<-chronos(sey_tree2, lambda=0, model = "discrete", cal=cal1, quiet = FALSE, control=chronos.control(nb.rate.cat=1))
save(sey_chronogramme ,file = paste0(dir_taxa_assign ,"sey_chronogramme.Rdata"))
ps_sey_tree <- merge_phyloseq(ps_sey_proka, sey_tree2)
save(ps_sey_tree, file = paste0(dir_taxa_assign, "ps_sey_tree.RData"))Now that the phyloseq is only made by Prokaryotes which phylogenetic distances are associated, we will finally filter all organels (Chloroplast and Mitochondria) and potential contaminants of the exctraction kits and steps of amplification as mentionned in Salter et al., 2014.
cont_list <- read.csv("/Users/marie-charlottecheutin/Drive/Thesis/Seychelles/WP5.3_data/data_sources/reference_databases/list_potential_contaminants.csv")
contaminant <- cont_list$Genus
sey_final <- subset_taxa(ps_sey_tree, Order != "Chloroplast")
sey_final <- subset_taxa(sey_final , Family != "Mitochondria")
sey_contaminant <- subset_taxa(sey_final , Genus %in% contaminant)
save(sey_contaminant , file = paste0(dir_taxa_assign, "sey_contaminant.Rdata"))
sey_final <- subset_taxa(sey_final , !Genus %in% contaminant)
save(sey_final, file = paste0(dir_taxa_assign,"sey_final.RData"))
seyrff_final <- prune_samples(sample_sums(sey_final) >= min(sample_sums(sey_final)) , sey_final)
seyrff_final <- rarefy_even_depth(seyrff_final, sample.size = min(sample_sums(sey_final)))
save(seyrff_final, file = paste0(dir_taxa_assign, "seyrff_final.RData"))Now we have the final phyloseq for all the samples, we can subset the phyloseq object for the gut, the algae and the turf and rarefy them. To be sure that the minimal abundance of the data set is enough to capture all the diversity, we use the function ggrare()
ggrare <- function(physeq_object, step = 10, label = NULL, color = NULL, plot = TRUE, parallel = FALSE, se = TRUE) {
x <- methods::as(phyloseq::otu_table(physeq_object), "matrix")
if (phyloseq::taxa_are_rows(physeq_object)) { x <- t(x) }
## This script is adapted from vegan `rarecurve` function
tot <- rowSums(x)
S <- rowSums(x > 0)
nr <- nrow(x)
rarefun <- function(i) {
cat(paste("rarefying sample", rownames(x)[i]), sep = "\n")
n <- seq(1, tot[i], by = step)
if (n[length(n)] != tot[i]) {
n <- c(n, tot[i])
}
y <- vegan::rarefy(x[i, ,drop = FALSE], n, se = se)
if (nrow(y) != 1) {
rownames(y) <- c(".S", ".se")
return(data.frame(t(y), Size = n, Sample = rownames(x)[i]))
} else {
return(data.frame(.S = y[1, ], Size = n, Sample = rownames(x)[i]))
}
}
if (parallel) {
out <- parallel::mclapply(seq_len(nr), rarefun, mc.preschedule = FALSE)
} else {
out <- lapply(seq_len(nr), rarefun)
}
df <- do.call(rbind, out)
# Get sample data
if (!is.null(phyloseq::sample_data(physeq_object, FALSE))) {
sdf <- methods::as(phyloseq::sample_data(physeq_object), "data.frame")
sdf$Sample <- rownames(sdf)
data <- merge(df, sdf, by = "Sample")
labels <- data.frame(x = tot, y = S, Sample = rownames(x))
labels <- merge(labels, sdf, by = "Sample")
}
# Add, any custom-supplied plot-mapped variables
if ( length(color) > 1 ) {
data$color <- color
names(data)[names(data) == "color"] <- deparse(substitute(color))
color <- deparse(substitute(color))
}
if ( length(label) > 1 ) {
labels$label <- label
names(labels)[names(labels) == "label"] <- deparse(substitute(label))
label <- deparse(substitute(label))
}
p <- ggplot2::ggplot(data = data,
ggplot2::aes_string(x = "Size",
y = ".S",
group = "Sample",
color = color))
p <- p + ggplot2::labs(x = "Sequence Sample Size", y = "Species Richness")
if (!is.null(label)) {
p <- p + ggplot2::geom_text(data = labels,
ggplot2::aes_string(x = "x",
y = "y",
label = label,
color = color),
size = 4, hjust = 0)
}
p <- p + ggplot2::geom_line()
if (se) { ## add standard error if available
p <- p +
ggplot2::geom_ribbon(ggplot2::aes_string(ymin = ".S - .se",
ymax = ".S + .se",
color = NULL,
fill = color),
alpha = 0.2)
}
if (plot) {
plot(p)
}
invisible(p)
}And now use it and create the phyloseq objects
load(paste0(dir_taxa_assign, "sey_final.RData"))
# Fish ps
sey_gut <- subset_samples(sey_final, type == "gut")
sey_gut <- prune_taxa(names(which(colSums(sey_gut@otu_table)>0)), sey_gut)
save(sey_gut, file = paste0(dir_taxa_assign, "sey_gut.RData"))
sort(sample_sums(sey_gut))
set.seed(10000)
p_gut <- ggrare(sey_gut, step = 500, color = "geomorpho", label = "tax1", se = FALSE)seyrff_gut <- prune_samples(sample_sums(sey_gut) >= min(sample_sums(sey_gut)) , sey_gut)
seyrff_gut <- rarefy_even_depth(seyrff_gut, sample.size = min(sample_sums(sey_gut)))
save(seyrff_gut, file = paste0(dir_taxa_assign, "seyrff_gut.RData"))
# Algae ps
sey_algae <- subset_samples(sey_final, tax1 %in% c("macroalgae", "turf"))
sey_algae <- prune_taxa(names(which(colSums(sey_algae@otu_table)>0)), sey_algae)
save(sey_algae, file = paste0(dir_taxa_assign, "sey_algae.RData"))
set.seed(10000)
p_algae <- ggrare(sey_algae, step = 500, color = "geomorpho", label = "tax1", se = FALSE)seyrff_algae <- prune_samples(sample_sums(sey_algae) >= min(sample_sums(sey_algae)) , sey_algae)
seyrff_algae <- rarefy_even_depth(seyrff_algae, sample.size = min(sample_sums(sey_algae)))
save(seyrff_algae, file = paste0(dir_taxa_assign, "seyrff_algae.RData"))Here again, we will remove all the object to clean the memory
And set_wd again
In this is part, we will focus on our sampling dataset and analyze the distribution of the hosts between the IRs and HRs.
envdata <- read.csv2(paste0(dir_refdb, "env.csv"))
colnames(envdata)[1] = "ID"
load(paste0(dir_taxa_assign, "sey_gut.RData"))
load(paste0(dir_taxa_assign, "sey_final.RData"))We will first write the table of the species and their corresponding diet.
diet_sp <- as.data.frame(table(envdata$diet4, envdata$tax1))[which(as.data.frame(table(envdata$diet4, envdata$tax1))[,3]> 0),][,c(1,2)]
sp_reef <- table(envdata$tax1, envdata$geomorpho)
sp_reef <- cbind(as.data.frame(sp_reef[,1]), as.data.frame(sp_reef[,2]))
table_diet_sp <- cbind(diet_sp[,2],diet_sp[,1],sp_reef[,c(1,2)])
table_to_print <- rbind(table_diet_sp[-c(26,37),], table_diet_sp[c(26,37),])
datatable(table_to_print, rownames = F, width = "100%",
colnames = c("Species", "Diet", "HR","IR"),
caption = htmltools::tags$caption(style = "caption-side:
bottom; text-align: left;",
"Table: ",
htmltools::em("Sampling table of the species and diet between reefs.")),
extensions = "Buttons",
options = list(columnDefs =
list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 50),
buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE))The distribution of the sampling set is a first result showing the influence of the shift on the fish communities. How are they distributed?
samp_data <- envdata[envdata$type=="gut",]
fam_tab <- table(samp_data$site,samp_data$family)
# Transform to log
fam.log <- log1p(fam_tab) # Equivalent: log(fam_tab + 1)
# Principal coordinate analysis and simple ordination plot
fam.D <- vegdist(fam.log, "bray")
res <- pcoa(fam.D)
#res$values
biplot(res, fam.log)#round(res$values$Relative_eig[1]*100, 1) # 57.8 %
#round(res$values$Relative_eig[2]*100, 1) # 26 %
site1 <- c("C1","C2","C3","C4", "M1","M2","M3")
site2 <- c(rep("coral", 4), rep("macroalgal",3))
site_data <- cbind(site1,site2)
colnames(site_data) <- c("site", "geomorpho")
site_data <- as.data.frame(site_data)
adonis(fam.D ~ geomorpho, data = site_data)##
## Call:
## adonis(formula = fam.D ~ geomorpho, data = site_data)
##
## Permutation: free
## Number of permutations: 5039
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.39044 0.39044 2.9776 0.37324 0.031 *
## Residuals 5 0.65564 0.13113 0.62676
## Total 6 1.04608 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
beta_reef <- betadisper(fam.D, site_data$geomorpho)
permutest(beta_reef)##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 5039
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 1 0.004688 0.0046883 0.4681 999 0.512
## Residuals 5 0.050075 0.0100150
Now, we will do the same on trophic srtucture with the diet
samp_data <- envdata[envdata$type=="gut",]
diet_tab <- table(samp_data$site,samp_data$diet4)
# Transform to log
diet.log <- log1p(diet_tab) # Equivalent: log(diet_tab + 1)
# Principal coordinate analysis and simple ordination plot
diet.D <- vegdist(diet.log, "bray")
res <- pcoa(diet.D)
#res$values
par(mfrow=c(1,2))
biplot(res, diet.log)
#round(res$values$Relative_eig[1]*100, 1) # 74.1 %
#round(res$values$Relative_eig[2]*100, 1) # 17.8 %
site1 <- c("C1","C2","C3","C4", "M1","M2","M3")
site2 <- c(rep("coral", 4), rep("macroalgal",3))
site_data <- cbind(site1,site2)
colnames(site_data) <- c("site", "geomorpho")
site_data <- as.data.frame(site_data)
adonis(diet.D ~ geomorpho, data = site_data)##
## Call:
## adonis(formula = diet.D ~ geomorpho, data = site_data)
##
## Permutation: free
## Number of permutations: 5039
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.32919 0.32919 5.2907 0.51412 0.034 *
## Residuals 5 0.31110 0.06222 0.48588
## Total 6 0.64029 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
beta_reef <- betadisper(diet.D, site_data$geomorpho)
permutest(beta_reef)##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 5039
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 1 0.007975 0.0079747 0.5265 999 0.44
## Residuals 5 0.075726 0.0151452
Because we will focus our sutdy on herbivores and invertivores, we need to know if families are equally distributed.
samp_data_inv <- samp_data[samp_data$diet4 == "Mobile invertebrate",]
fam_tab <- table(samp_data_inv$site,samp_data_inv$family)
# Transform to log
fam.log <- log1p(fam_tab) # Equivalent: log(fam_tab + 1)
# Principal coordinate analysis and simple ordination plot
fam.D <- vegdist(fam.log, "bray")
res <- pcoa(fam.D)
#res$values
biplot(res, fam.log)#round(res$values$Relative_eig[1]*100, 1) # 57.8 %
#round(res$values$Relative_eig[2]*100, 1) # 26 %
site1 <- c("C1","C2","C3","C4", "M1","M2","M3")
site2 <- c(rep("coral", 4), rep("macroalgal",3))
site_data <- cbind(site1,site2)
colnames(site_data) <- c("site", "geomorpho")
site_data <- as.data.frame(site_data)
adonis(fam.D ~ geomorpho, data = site_data)##
## Call:
## adonis(formula = fam.D ~ geomorpho, data = site_data)
##
## Permutation: free
## Number of permutations: 5039
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.02326 0.023259 0.11416 0.02232 0.917
## Residuals 5 1.01868 0.203736 0.97768
## Total 6 1.04194 1.00000
beta_reef <- betadisper(fam.D, site_data$geomorpho)
permutest(beta_reef)##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 5039
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 1 0.027353 0.027353 0.6528 999 0.593
## Residuals 5 0.209495 0.041899
Microbial community of fish is very high and it is very difficult to discern less than 10 taxa. That’s why, we kept the same color for the same taxa, in order to be able to compare the graphs between compartemtn, diet or fish families. Of course, if your are only interested in some taxa, better choose few colors. For the bigger table (i.e tax table at Order level), distinctColorPalette() could be very useful but be awared that colors are not repeated. Here is the code :
phylum_colors <- c('Acidobacteria'='lavenderblush4',
'Actinobacteria'='darkblue',
'Armatimonadetes'='cadetblue3',
'Bacteroidetes'='cornflowerblue',
'Calditrichaeota'='azure3',
'Chloroflexi'='#DCE1D2',
'Cyanobacteria'='#DE6554',
'Dadabacteria'='brown2',
'Deferribacteres'='darkslategray1',
'Deinococcus-Thermus'='salmon4',
'Dependentiae'='sandybrown',
'Epsilonbacteraeota'='darkslateblue',
'Elusimicrobia'='plum1',
'Euryarchaeota'='hotpink4',
'Firmicutes'='brown4',
'Fusobacteria'='orange',
'Gemmatimonadetes'='darkolivegreen',
'Kiritimatiellaeota'='darkkhaki',
'Marinimicrobia_(SAR406_clade)'='darkgoldenrod4',
'Latescibacteria'='darkseagreen2',
'Lentisphaerae'='darkseagreen4',
'Patescibacteria'='darkturquoise',
'Planctomycetes'='darkslategray',
'Proteobacteria'='aquamarine4',
'Spirochaetes'='darkolivegreen3',
'Tenericutes'='#CA8EA7',
'Thaumarchaeota'='gold3',
'Verrucomicrobia'='darkgreen',
'WPS-2'='thistle2',
'Other'= 'black',
'Z-Other' = 'black')
class_colors <- c("Acidimicrobiia"="darksalmon",
"Acidobacteriia"="lavenderblush4",
"Actinobacteria"="darkblue",
"Alphaproteobacteria"="lightseagreen",
"Babeliae"="peachpuff",
"Anaerolineae"="tomato2",
"Bacilli"="brown4",
"Bacteroidia"="cornflowerblue",
"Brachyspirae"="darkolivegreen2",
"Campylobacteria"="royalblue",
"Clostridia"="orange3",
"Coriobacteriia"="deepskyblue4",
"Deferribacteres"="darkslategray1",
"Deinococci"="skyblue3",
"Deltaproteobacteria"="skyblue4",
"Erysipelotrichia"="yellow",
"Fusobacteriia"="orange",
"Fimbriimonadia" = "darkseagreen",
"Gammaproteobacteria"="aquamarine4",
"Kiritimatiellae"="darkgray",
"Lentisphaeria"="darkseagreen4",
"Microgenomatia"="seashell3",
"Mollicutes"="#CA8EA7",
"Negativicutes"="palevioletred4",
"Nitrososphaeria"="sandybrown",
"Oxyphotobacteria"="#DE6554",
"Phycisphaerae"="rosybrown4",
"Planctomycetacia"="darkslategray",
"Rhodothermia"="cornsilk3",
"Spirochaetia"="darkolivegreen3",
"Thermoanaerobaculia"="purple",
"Thermoleophilia"="deeppink4",
"Verrucomicrobiae"="darkgreen")
#library(randomcoloR)
#n <- length(levels(core_physeq_order$Order))
#order_colors <- distinctColorPalette(n)Let’s analyze our data previously proceeded. In order to descrimine which ASVs are rare (transients) and which would be considered as permanent (core), we use a dispersion index for each ASV and compare it at the Poisson distribution to determine which ASVs are signifcantly randomly distributed (transiant) and which are not (core) following the method of Fillol et al. (2016).
We create a new folder for the enteric microbiome of reef fish and determine the core.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Core/")
dir.create(dir_data_cleaning, recursive = T)
load(paste0(dir_taxa_assign, "sey_gut.RData"))
load(paste0(dir_taxa_assign, "seyrff_gut.RData"))library(labdsv)
otu_table <- seyrff_gut@otu_table@.Data
abuoccplot_otu <- abuocc(otu_table)## Press return for next plot
## Press return for next plot
## Do you want to identify individual species? Y/N :
## Press return for next plot
## Do you want to identify individual plots? Y/N :
#sub_objects of abuocc objects
str(abuoccplot_otu)## List of 3
## $ spc.plt: Named int [1:99] 15 9 12 38 29 13 28 9 13 6 ...
## ..- attr(*, "names")= chr [1:99] "AVR-001" "AVR-002" "AVR-005" "AVR-007" ...
## $ plt.spc: Named int [1:2768] 1 1 1 1 1 2 1 1 1 1 ...
## ..- attr(*, "names")= chr [1:2768] "ASV30583" "ASV16988" "ASV12732" "ASV538" ...
## $ mean : Named num [1:2768] 1 11 8 9 9 7 6 3 14 5 ...
## ..- attr(*, "names")= chr [1:2768] "ASV30583" "ASV16988" "ASV12732" "ASV538" ...
## - attr(*, "call")= language abuocc(comm = otu_table)
## - attr(*, "comm")= chr [1:14961] "structure(list(ASV30583 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, " "0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, " "0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, " "0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, " ...
## - attr(*, "timestamp")= chr "Fri Jul 10 16:56:02 2020"
## - attr(*, "class")= chr "abuocc"
# transform spc.plt vector into table in order to calculate specific richness
richness_otu <- data.frame(abuoccplot_otu$spc.plt)
# occurence of each OTU
otu_occurence <- data.frame(abuoccplot_otu$plt.spc)
mean.abun_otu <- colSums(otu_table)/otu_occurence
square_otu <- otu_table^2
ss_otu <- data.frame(colSums(square_otu))
# Variance calculation
variance_otu=ss_otu/otu_occurence-mean.abun_otu^2
disp_otu <- (variance_otu/mean.abun_otu)*otu_occurence
# IC calculation for Poisson distribution using Chi square distribution (value and formula within Zar p574)
library(epitools)
poisic_otu = pois.exact(otu_occurence, conf.level = 0.95)
gut_dstat_otu <- cbind(mean.abun_otu, disp_otu, otu_occurence, poisic_otu)
names(gut_dstat_otu) <- c("average","disp", "occurence", "x", "pt", "rate", "lower", "upper", "prob")
save(gut_dstat_otu, file=paste0(dir_data_cleaning, "gut_sey_dstat_asv.RData"))
# Selection of core ASVs
gut_sey_core_otu <- gut_dstat_otu[gut_dstat_otu$disp > gut_dstat_otu$upper,]
gut_sey_core_otu <- na.exclude(gut_sey_core_otu)
gut_tax <- data.frame(seyrff_gut@tax_table@.Data)
gut_sey_core_otu$tax <- gut_tax[rownames(gut_tax) %in% row.names(gut_sey_core_otu),6]
gut_sey_core_otu$phylum <- gut_tax[rownames(gut_tax) %in% row.names(gut_sey_core_otu),2]
save(gut_sey_core_otu, file = paste0(dir_data_cleaning,"gut_sey_core_otu.Rdata"))
sey_gut_core <-prune_taxa(rownames(gut_sey_core_otu), seyrff_gut)
save(sey_gut_core, file = paste0(dir_data_cleaning, "sey_gut_core.RData"))To blast the reads of the core, we need to generate a fasta file with the function writeXStringSet().
Plot a treemap of the core thanks to the treemap() function for the main contributor of the core.
gut_core_order$Phylum = as.character(gut_core_order$Phylum) # Avoid error message with factor for next step
sort(table(factor(gut_core_order$Phylum)), T)##
## Proteobacteria Firmicutes Planctomycetes Bacteroidetes
## 180 113 40 36
## Cyanobacteria Verrucomicrobia Fusobacteria Spirochaetes
## 36 35 29 21
## Tenericutes Actinobacteria Kiritimatiellaeota Deferribacteres
## 16 6 3 2
gut_core_order[!gut_core_order$Phylum %in% c("Proteobacteria","Bacteroidetes","Cyanobacteria","Firmicutes","Planctomycetes", "Spirochaetes", "Verrucomicrobia","Fusobacteria","Tenericutes"),which(names(gut_core_order) == "Phylum", T)] <- "Other" #Change phylum to other for those not included in the list
group <- gut_core_order$Phylum
subgroup <- gut_core_order$Order
value <- gut_core_order$Abundance
gut_core_treemap_data=data.frame(group,subgroup,value)
library(treemap)
gut_core_treemap <- treemap(gut_core_treemap_data,
index=c("group","subgroup"), vSize = "value", type = "index",
fontcolor.labels=c("white","black"),
fontsize.labels=c(12),bg.labels=c("transparent"),
fontface.labels=c(2,3),
border.col=c("black","white"), border.lwds=c(4,2),
align.labels=list(c("center", "center"),c("left", "bottom")),
title="Seychelles Enteric Core Treemap",fontsize.title=12,
fontfamily.title ="serif")and visualize the relative contribution of the different taxa (with the fill= argument for the level of the taxonomy you want represent) for the different reef condition or site with the argument x=.
Ex : The composition of the enteric core microbiome between reef fish families at phylum level.
To have a representation of the size of the core in the entire community, we can calculate the ratio of the sequences. This can give us informations on the inter individual dispersion : the higher the intra-species dispersion, the lower the size of the core and the larger the variable bacterial community of the compartment or species.
ratio_gut <- sum(sample_sums(sey_gut_core)) / sum(sample_sums(seyrff_gut))
percent(ratio_gut)## [1] "62%"
In order to represent the taxa in the whole microbial description via Blast, we can then represent the source of our sequence (where they have already been described) in an interactive tree in Itol (Interactive Tree of Life). To do that, we need to transform our phylogenetic tree from the phyloseq object (obtained after an alignement in ARB here) in an xml output thanks to the write_xml() function.
devtools::install_github("USCBiostats/rphyloxml")
library(ape)
library(rphyloxml)
set.seed(12)
gut_tree <- sey_gut_core@phy_tree
gut_phyloxml <- write_phyloxml(gut_tree)
cat(as.character(gut_phyloxml))
xml2::write_xml(gut_phyloxml, "gut_phyloxml.xml")Then, in order to add supplementary envdata (other than Blast, as a gradient, host species, geography etc….) you have to use a special template .txt for Itol. The function create_itol_files(env.xlsx) from the script table2itol.R is very useful to made automatically the templates from your env file. You are free to change the colors and shape as you want next. It faster than made a manually entry in the Itol website.
We create a new folder for the algal microbiome and determine the core.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Algae/Core/")
dir.create(dir_data_cleaning, recursive = T)
load(paste0(dir_taxa_assign, "sey_algae.RData"))
load(paste0(dir_taxa_assign, "seyrff_algae.RData"))library(labdsv)
otu_table <- seyrff_algae@otu_table@.Data
abuoccplot_otu <- abuocc(otu_table)## Press return for next plot
## Press return for next plot
## Do you want to identify individual species? Y/N :
## Press return for next plot
## Do you want to identify individual plots? Y/N :
#sub_objects of abuocc objects
str(abuoccplot_otu)## List of 3
## $ spc.plt: Named int [1:29] 125 91 167 97 97 100 156 99 138 146 ...
## ..- attr(*, "names")= chr [1:29] "C1-turf-A" "C1-turf-B" "C2-T-A" "C2-T-B" ...
## $ plt.spc: Named int [1:2181] 1 12 1 2 1 1 1 1 8 1 ...
## ..- attr(*, "names")= chr [1:2181] "ASV12005" "ASV1170" "ASV21768" "ASV3441" ...
## $ mean : Named num [1:2181] 65 26.9 32 10 3 ...
## ..- attr(*, "names")= chr [1:2181] "ASV12005" "ASV1170" "ASV21768" "ASV3441" ...
## - attr(*, "call")= language abuocc(comm = otu_table)
## - attr(*, "comm")= chr [1:3754] "structure(list(ASV12005 = c(0, 0, 0, 65, 0, 0, 0, 0, 0, 0, 0, " "0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), ASV1170 = c(0, " "0, 49, 0, 0, 0, 16, 17, 0, 0, 0, 63, 0, 0, 31, 15, 0, 0, 0, 0, " "0, 0, 9, 14, 31, 49, 16, 13, 0), ASV21768 = c(0, 0, 0, 0, 0, " ...
## - attr(*, "timestamp")= chr "Fri Jul 10 16:56:17 2020"
## - attr(*, "class")= chr "abuocc"
# transform spc.plt vector into table in order to calculate specific richness
richness_otu <- data.frame(abuoccplot_otu$spc.plt)
# occurence of each OTU
otu_occurence <- data.frame(abuoccplot_otu$plt.spc)
mean.abun_otu <- colSums(otu_table)/otu_occurence
square_otu <- otu_table^2
ss_otu <- data.frame(colSums(square_otu))
# Variance calculation
variance_otu=ss_otu/otu_occurence-mean.abun_otu^2
disp_otu <- (variance_otu/mean.abun_otu)*otu_occurence
# IC calculation for Poisson distribution using Chi square distribution (value and formula within Zar p574)
library(epitools)
poisic_otu = pois.exact(otu_occurence, conf.level = 0.95)
algae_dstat_otu <- cbind(mean.abun_otu, disp_otu, otu_occurence, poisic_otu)
names(algae_dstat_otu) <- c("average","disp", "occurence", "x", "pt", "rate", "lower", "upper", "prob")
save(algae_dstat_otu, file=paste0(dir_data_cleaning, "algae_sey_dstat_asv.RData"))
# Selection of core ASVs
algae_sey_core_otu <- algae_dstat_otu[algae_dstat_otu$disp > algae_dstat_otu$upper,]
algae_sey_core_otu <- na.exclude(algae_sey_core_otu)
algae_tax <- data.frame(seyrff_algae@tax_table@.Data)
algae_sey_core_otu$tax <- algae_tax[rownames(algae_tax) %in% row.names(algae_sey_core_otu),6]
algae_sey_core_otu$phylum <- algae_tax[rownames(algae_tax) %in% row.names(algae_sey_core_otu),2]
save(algae_sey_core_otu, file = paste0(dir_data_cleaning,"algae_sey_core_otu.Rdata"))
sey_algae_core <-prune_taxa(rownames(algae_sey_core_otu), seyrff_algae)
save(sey_algae_core, file = paste0(dir_data_cleaning, "sey_algae_core.RData"))To blast the reads of the core, we need to generate a fasta file with the function writeXStringSet().
Plot a treemap of the core thanks to the treemap() function for the main contributor of the core.
algae_core_order$Phylum = as.character(algae_core_order$Phylum) # Avoid error message with factor for next step
sort(table(factor(algae_core_order$Phylum)), T)##
## Proteobacteria Bacteroidetes Cyanobacteria Planctomycetes Verrucomicrobia
## 137 56 32 18 17
## Actinobacteria Firmicutes Fusobacteria Chloroflexi
## 4 2 2 1
algae_core_order[!algae_core_order$Phylum %in%
c("Proteobacteria", "Bacteroidetes","Cyanobacteria","Verrucomicrobia","Planctomycetes"),which(names(algae_core_order) == "Phylum", T)] <- "Other" #Change phylum to other for those not included in the list
group <- algae_core_order$Phylum
subgroup <- algae_core_order$Order
value <- algae_core_order$Abundance
algae_core_treemap_data=data.frame(group,subgroup,value)
library(treemap)
algae_core_treemap <- treemap(algae_core_treemap_data,
index=c("group","subgroup"), vSize = "value", type = "index",
fontcolor.labels=c("white","black"),
fontsize.labels=c(12),bg.labels=c("transparent"),
fontface.labels=c(2,3),
border.col=c("black","white"), border.lwds=c(4,2),
align.labels=list(c("center", "center"),c("left", "bottom")),
title="Seychelles Algae Core Treemap",fontsize.title=12,
fontfamily.title ="serif")and visualize the relative contribution of the different taxa (with the fill= argument for the level of the taxonomy you want represent) for the different reef condition or site with the argument x=.
Ex : The composition of the core algal microbiome between Mahe and Praslin at phylum level.
To have a representation of the size of the core in the entire community, we can calculate the ratio of the sequences. This can give us informations on the inter individual dispersion : the higher the intra-species dispersion, the lower the size of the core and the larger the variable bacterial community of the compartment or species.
ratio_algae <- sum(sample_sums(sey_algae_core)) / sum(sample_sums(seyrff_algae))
percent(ratio_algae)## [1] "57%"
We create a new folder for the total core microbiome and compare both compatment in composition (LefSe) and in diversity (alpha and beta).
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Total Core/")
dir.create(dir_data_cleaning, recursive = T)
load(paste0(dir_taxa_assign, "sey_final.RData"))
load(paste0(dir_taxa_assign, "seyrff_final.RData"))We use the otu tables of the enteric core and algal core to keep the ASVs names and subset them in the initial phyloseq object sey_final.
core_ASV <- unique(c(row.names(gut_sey_core_otu), row.names(algae_sey_core_otu)))
global_core <- subset_taxa(sey_final, taxa_names(sey_final) %in% core_ASV)
save(global_core, file = paste0(dir_data_cleaning, "global_core.RData"))
core_physeq_order <- global_core %>%
tax_glom(taxrank = "Order") %>% # agglomerate at order level
transform_sample_counts(function(x) {x/sum(x)} ) %>% # Transform to rel. abundance
psmelt() %>% # Melt to long format
arrange(Order)
save(core_physeq_order , file = paste0(dir_data_cleaning, "core_physeq_order.RData"))Now we can visualize easily by compartments the bacterial composition with the chosen taxa levels (e.g. Phylum here).
We compare the taxonomic diversity (observed richness and exp(shannon index) for the effective number of species (ENS)) between the normalized tables (478 ASVs normalized) of the enteric microbiome and epiphytes bacteria. Note that 478 was chosen because Lethrinus samples were very poor but essential for the comparisons between health status of the reef. Following the richness curves made higher during the Phyloseq process
## W stat p-value
## Observed richness 2570.5 1.057345e-10
## Shannon (ENS) 2653.0 4.286702e-12
Even we work on the core of normalized tables, we calculate the beta diversity ont the relative ASVs counts with the Bray-curtis distances with vegdist function from vegan package (Oksanen et al., 2019).
## [1] "Axis 1 : 9%"
## [1] "Axis 2 : 7%"
## [1] "Axis 3 : 5%"
## [1] "Axis 4 : 5%"
We then compare the beta diversity with the adonis function and test the dispersion between samples with betadisper.
library(pairwiseAdonis)
adonis(otu ~ Type, data = samp_data) # between gut, macroalgae and turf##
## Call:
## adonis(formula = otu ~ Type, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## Type 1 3.716 3.7163 8.6988 0.06458 0.001 ***
## Residuals 126 53.829 0.4272 0.93542
## Total 127 57.545 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pairwise.adonis(otu , samp_data$Type)beta <- betadisper(otu, samp_data$Type)
permutest(beta)##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 1 0.22289 0.222895 84.163 999 0.001 ***
## Residuals 126 0.33369 0.002648
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In order to detect biomarkers proper to each compartment, we proceed to a Linear discriminant analysis on effect size (LEfSe) available on Galaxy. Here is the code to prepare the data input to proceed in Galaxy hub.
Proceed the “lefse_type_sey.txt” in a LEfse Galaxy and retrieve data “LDA Effect Size” with the associated “Plot LEfSe Results” in the LDA folder.You have to rename the former file “LDA_results” and run the following code.
library(plyr)
LDA_Effect_Size <- read.delim(paste0(dir_data_cleaning, "LDA_results"), header=FALSE)
LDA_gut <- subset(LDA_Effect_Size , LDA_Effect_Size$V3 == "Gut")
LDA_algae <- subset(LDA_Effect_Size , LDA_Effect_Size$V3 == "Algae")
LDA_type <- rbind(LDA_gut,LDA_algae)
LDA_phylum <- LDA_type$V1
names(LDA_phylum) <- ".Phylum.Class.Order.Family.Genus"
write.table(LDA_phylum, file=paste0(dir_data_cleaning, "LDA_phylum.txt"), sep="\t", row.names=F, col.names=F, quote=FALSE)
#paste the names(LDA_phylum) in excel then read it again.
library(readr)
LDA_phylum <- read_delim(paste0(dir_data_cleaning, "LDA_phylum.txt"),".", escape_double = FALSE, col_types = cols(X1 = col_skip()),trim_ws = TRUE)
#It will transform it in tabble and convert "." by separator
LDA_phylum_type <- cbind(LDA_phylum , LDA_type[,2:5])
colnames(LDA_phylum_type) <- c(names(LDA_phylum)[1:5] , "LDA_res" , "Type" , "LDA_res2" , "sig")
write.table(LDA_phylum_type, file=paste0(dir_data_cleaning, "LDA_phylum_type.txt"), sep="\t", row.names=F, col.names=T, quote=FALSE)
LDA_phylum_type <- read_delim(paste0(dir_data_cleaning, "LDA_phylum_type.txt"),"\t", escape_double = FALSE, trim_ws = TRUE)
# Select your significativ value and taxa
LEFSE_sig <- subset(LDA_phylum_type, LDA_phylum_type$LDA_res2 >= 3)
LEFSE_sig_genus <- subset(LEFSE_sig, LEFSE_sig$Genus %in% as.character(na.exclude(LEFSE_sig$Genus)))
table(LEFSE_sig_genus$Type)
LEFSE_sig_genus_vec <- paste(LEFSE_sig_genus$Phylum, LEFSE_sig_genus$Class, LEFSE_sig_genus$Order, LEFSE_sig_genus$Family, LEFSE_sig_genus$Genus, sep = "|")
biomarkers <- paste(LEFSE_sig_genus$Order, LEFSE_sig_genus$Genus, sep = "|")
save(biomarkers, file = paste0(dir_data_cleaning, "biomarkers.RData"))
#sey_samp <- sey_compartment.genus@sam_data
#write.table(sey_samp, file=paste0(dir_data_cleaning, "sey_samp.txt"), sep="\t", row.names=F, col.names=T, quote=FALSE) # You need to numerate the tax3 name for the turf and sargassum samples and re-read it.
sey_samp <- read_table2(paste0(dir_data_cleaning, "sey_samp.txt"))
sample_data(sey_compartment.genus)$tax3 <- sey_samp$tax3
levels(sample_data(sey_compartment.genus)$lineage) <- c("Algae", "Gut")
physeq_phylum <- sey_compartment.genus %>%
transform_sample_counts(function(x) {x/sum(x)} ) %>% # Transform to rel. abundance
psmelt() %>%
filter(Abundance > 0) %>% # Melt to long format
arrange(Genus)
physeq_phylum$Abundance <- physeq_phylum$Abundance/nrow(sample_data(sey_compartment.genus))
physeq_phylum$tax <- paste(physeq_phylum$Phylum, physeq_phylum$Class, physeq_phylum$Order, physeq_phylum$Family, physeq_phylum$Genus, sep = "|")
save(physeq_phylum, file= paste0(dir_data_cleaning, "physeq_phylum.RData"))
write.table(physeq_phylum ,file = paste0(dir_data_cleaning, "physeq_phylum.txt"), sep="\t",quote = F)
Diet_Pol_sey <- physeq_phylum
Diet_Pol_sey$tax <- paste(Diet_Pol_sey$Order, Diet_Pol_sey$Genus, sep = "|")
length(physeq_phylum$tax)
df_sey <- Diet_Pol_sey[,c("lineage","Sample","tax","Abundance")]
colnames(df_sey) <- c("family","item","score","value")
save(df_sey, file = paste0(dir_data_cleaning, "df_sey.RData"))Let’s organize our data to plot corresponding colours depending on the compartment.
load(paste0(dir_data_cleaning , "df_sey.RData"))
load(paste0(dir_data_cleaning , "biomarkers.RData"))
library(plyr)
print_biomarkers <- levels(factor(df_sey[df_sey$score %in% biomarkers,]$score)) # All biomarkers present in the sey.compartment_genus RData file
others <- levels(factor(df_sey[!df_sey$score %in% biomarkers,]$score)) # All taxa which are not biomarkers
df_sey[!df_sey$score %in% biomarkers,]$score <- "Z-Other" # Call them other (Z to figure at the end of the list)
# Some biomarkers are in common so Output the frequences.
freq_gut <- table(df_sey[df_sey$family == "Gut",]$score)/length(df_sey[df_sey$family == "Gut",]$score)*100
freq_gut <- freq_gut[!names(freq_gut) == "Z-Other"]
freq_algae <- table(df_sey[df_sey$family == "Algae",]$score)/length(df_sey[df_sey$family == "Algae",]$score)*100
freq_algae <- freq_algae[!names(freq_algae) == "Z-Other"]
# Keep the common biomarkers of the algae and gut and the respective frequencies in each compartment
biom_common <- names(freq_algae[which(names(freq_algae) %in% names(freq_gut))])
freq_gut_common <- as.data.frame(freq_gut[names(freq_gut) %in% biom_common])
freq_algae_common <- as.data.frame(freq_algae[names(freq_algae) %in% biom_common])
freq_common <- cbind(freq_gut_common, freq_algae_common$Freq)
colnames(freq_common) <- c("Biomarkers", "Freq_gut" ,"Freq_algae")
# Class the biomarkers in the compartment where the value is the higher
biom_gut_to_keep <- freq_common$Biomarkers[which(freq_common$Freq_gut > freq_common$Freq_algae, T)]
biom_algae_to_keep <- freq_common$Biomarkers[which(freq_common$Freq_algae > freq_common$Freq_gut, T)]
# See which are the biomarkers for the gut and which are the biomarkers for the algae
biom_gut <- levels(factor(df_sey[df_sey$family == "Gut",]$score))
biom_gut <- biom_gut[!biom_gut == "Z-Other"]
biom_gut <- biom_gut[!biom_gut %in% biom_algae_to_keep]
biom_algae <- levels(factor(df_sey[df_sey$family == "Algae",]$score))
biom_algae <- biom_algae[!biom_algae == "Z-Other"]
biom_algae <- biom_algae[!biom_algae %in% biom_gut_to_keep]
# Color choice
algae_colors <- c("darkgreen" , "darkkhaki", "darkolivegreen",
"darkolivegreen2","forestgreen","chartreuse",
"aquamarine","aquamarine3", "darkcyan",
"darkseagreen", "yellowgreen", "darkslategray",
"gold3", "gold4", "goldenrod",
"gold","palegreen4","chartreuse3",
"palegreen3", "seagreen3","palegoldenrod","yellow")
pie(rep(1, 22), col= algae_colors)gut_colors <- c("brown", "brown1", "burlywood4",
"chocolate","chocolate4", "coral2",
"darksalmon","darkred","darkorchid4",
"darkmagenta", "blueviolet","darkblue",
"blue3", "deeppink4","deeppink")
pie(rep(1, 15), col= gut_colors)polar_col <- c(algae_colors , gut_colors , 'Z_Other'="black")
# Add a prefix G (for gut) or A (for algae) in their respectiv biomarkers. In this way, they will be listed following
# their respectiv compartment and no their names (easier for the add of the color in the ggplot)
df_sey[df_sey$score %in% biom_gut,]$score <- paste("G",df_sey[df_sey$score %in% biom_gut,]$score, sep= "-")
df_sey[df_sey$score %in% biom_algae,]$score <- paste("A",df_sey[df_sey$score %in% biom_algae,]$score, sep= "-")We use the script of Ladroue et al.,(2012) to draw the histogram with the polarHistogram function available here.
source(paste0(dir_refdb, "polarHistogram.R"))
p <- polarHistogram(df_sey , familyLabel=T, innerRadius = 0.2, spaceFamily =7, circleProportion = 0.85)
p + ggplot2::scale_fill_manual(values= polar_col) +
theme(legend.text = element_text(family = "serif", size = 3)) +
theme(legend.position= "none") # to hide the legend because of the sizeThe microbial compositions between algae and the gut of reef fish is very different with many aerobic groups for the macroalgae and most of all anaerobic for the enteric core microbiomes. The blast analyses confirm the source of our ASVs which are mainly associated with animal host marine microbiomes. Their role have been described as important in the function of nutrition (e.g. Vibrionales, Clostridiales). A first observation remains of the very high and impressive variability between enteric core microbiomes. Now, we wonder what would be the origin of a such variability by testing the diet, the taxonomic effect and the effect of the environment as possible determinant of the variability of the microbiome.
Our main issue is to determine if the shift coral-macroalgal in the Seychelles reefs would influence the composition of the core microbiome of their inhabitants. To test that, we will first analyse the influence on the environmental microbiome through the algae. Then, because a modification of the microbial environment would directly modify the microbiome of their consummer (here, the herbivores and indirectly the invertivores), we will test the effect of the shift on these both communities. We saw that core microbiome was highly variable between individuals. The taxonomic effect is already known to be a major determinant in the microbial composition in fishes. That is why, we will test the influence of the shift at different taxonomical levels (family, genus and species) when the sampling size was sufficient.
We will first create once again a new folder for the effect of the shift and other environmental parameters and test both on alpha and beta diversity.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Algae/Shift/")
dir.create(dir_data_cleaning, recursive = T)load(paste0(dir_taxa_assign, "seyrff_algae.RData"))
box1 = plot_richness(seyrff_algae , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
box1$data$geomorpho <- str_replace_all(box1$data$geomorpho, c("macroalgal"= "M" , "coral"="C"))
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_algae_alpha_div = ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) +
theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 18, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=12, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")),
map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_algae_alpha_div# ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 62 0.06348685
## Shannon (ENS) 60 0.05108988
load(paste0(path, "/analyses/04_data_cleaning/Algae/Core/sey_algae_core.RData"))
core_algae_rel <- transform_sample_counts(sey_algae_core, function(x) x / sum(x) )
save(core_algae_rel , file = paste0(dir_data_cleaning, "core_algae_rel.RData"))
library(vegan)
library(ape)
otu <- vegdist(core_algae_rel@otu_table, method = "bray")
save(otu, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(otu)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_algae_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 18%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 15%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 8%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 6%"
palette = c("darkblue","darkred")
pcoa <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho), alpha = 0.7, size = 3, shape = 17) +
scale_color_manual(values = palette)+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=14), # remove x-axis labels
axis.title.y = element_text(size=14), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank()) + theme(axis.title = element_text(family = "serif", size = 14),
legend.text = element_text(size = 11, family = "serif"),
legend.title = element_text(size = 11,family = "serif")) +
labs(colour = "Site", fill = "Reef")
pcoa# _____ Permanovas and Betadisper ------
samp_data <- read_delim(paste0(dir_data_cleaning,"samp_data.txt"),"\t", escape_double = FALSE, trim_ws = TRUE)Let’s test the different microbial compositions in function of the geomorphology of the site (coral or macroalgal cover), the site and the substrata.
# Between geomorpho, site , substrat
adonis(otu ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = otu ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.6960 0.69604 2.1204 0.07282 0.009 **
## Residuals 27 8.8628 0.32825 0.92718
## Total 28 9.5589 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(otu ~ site, data = samp_data)##
## Call:
## adonis(formula = otu ~ site, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## site 6 3.4182 0.56970 2.041 0.35759 0.001 ***
## Residuals 22 6.1407 0.27912 0.64241
## Total 28 9.5589 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(otu ~ substrat, data = samp_data)##
## Call:
## adonis(formula = otu ~ substrat, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## substrat 2 1.2421 0.62106 1.9416 0.12994 0.006 **
## Residuals 26 8.3168 0.31988 0.87006
## Total 28 9.5589 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because we had two types of macroalgae, and overall two different clades (brown macroalgae with the Sargassum and a green macroalgae with the turf)n, we tested if the environment would always plays a role inside a type of algae.
adonis(otu ~ type/geomorpho, strata= samp_data$type, data = samp_data)##
## Call:
## adonis(formula = otu ~ type/geomorpho, data = samp_data, strata = samp_data$type)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## type 1 1.2689 1.26886 4.4608 0.13274 0.002 **
## type:geomorpho 2 1.1789 0.58947 2.0724 0.12333 0.002 **
## Residuals 25 7.1111 0.28444 0.74392
## Total 28 9.5589 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(otu ~ type/site, strata= samp_data$type, data = samp_data)##
## Call:
## adonis(formula = otu ~ type/site, data = samp_data, strata = samp_data$type)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## type 1 1.2689 1.26886 5.3586 0.13274 0.001 ***
## type:site 8 3.7911 0.47388 2.0013 0.39660 0.001 ***
## Residuals 19 4.4990 0.23679 0.47066
## Total 28 9.5589 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(otu ~ type/substrat, strata= samp_data$type, data = samp_data)##
## Call:
## adonis(formula = otu ~ type/substrat, data = samp_data, strata = samp_data$type)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## type 1 1.2689 1.26886 4.5678 0.13274 0.003 **
## type:substrat 3 1.6232 0.54108 1.9479 0.16982 0.003 **
## Residuals 24 6.6668 0.27778 0.69744
## Total 28 9.5589 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(otu ~ type/island, strata= samp_data$type, data = samp_data)##
## Call:
## adonis(formula = otu ~ type/island, data = samp_data, strata = samp_data$type)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## type 1 1.2689 1.26886 4.3012 0.13274 0.017 *
## type:island 1 0.6200 0.61997 2.1016 0.06486 0.017 *
## Residuals 26 7.6701 0.29500 0.80240
## Total 28 9.5589 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We have 3 families of herbivores : the scrappers Scaridae and the browsers Acanthuridae and Siganidae. Unfortunately, the only individuals present in the impacted shifted sites were three Scarus ghobban. Nevertheless, we tried to determine which were the other determinant in each components and test the effect of other environmental parameters.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/Herb/")
dir.create(dir_data_cleaning, recursive = T)We need to subset from the sey_gut phyloseq object the herbivorous fishes.
herb_ps <- subset_samples(sey_gut, diet3 == "Herbivorous")
herb_ps <- prune_taxa(names(which(colSums(herb_ps@otu_table)>0)), herb_ps)
save(herb_ps, file = paste0(dir_data_cleaning, "herb_ps.RData"))
rff_herb <- prune_samples(sample_sums(herb_ps) >= min(sample_sums(herb_ps)) , herb_ps)
rff_herb <- rarefy_even_depth(herb_ps ,min(sample_sums(herb_ps)))
save(rff_herb , file = paste0(dir_data_cleaning, "rff_herb.RData"))
herb_core <- subset_samples(sey_gut_core, diet3 == "Herbivorous")
herb_core <- prune_taxa(names(which(colSums(herb_core@otu_table)>0)), herb_core)
save(herb_core, file = paste0(dir_data_cleaning, "herb_core.RData"))box1 = plot_richness(rff_herb , measures = c("Observed","Shannon") , color = "geomorpho")
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
palette = c("darkblue","darkred")
sey_herb_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE,
textsize=5, color="black", family = "serif", vjust = 0.1)
sey_herb_alpha_div # ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 235.5 0.9248596
## Shannon (ENS) 234.0 0.8979202
core_herb_rel <- transform_sample_counts(herb_core, function(x) x / sum(x) )
save(core_herb_rel , file = paste0(dir_data_cleaning, "core_herb_rel.RData"))
# _____ PCOA coda & permanova ------
matrix <- vegdist(core_herb_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_herb_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data, file = paste0(dir_data_cleaning, "samp_data.txt"), quote = F, sep = "\t")
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 13%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 10%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 8%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 7%"
pcoa_herb <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho, shape = family), alpha = 0.7, size = 8) +
scale_color_manual(values = palette)+
scale_shape_manual(values = c(16,17,18))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef")
pcoa_herbPermanova on the only effect of the geomorpho but there is a lot of variability…
adonis(matrix ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = matrix ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 1.2963 1.29629 3.0248 0.06718 0.001 ***
## Residuals 42 17.9994 0.42856 0.93282
## Total 43 19.2957 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
PERMANOVAs on the taxonomic effect between genus and species within families
adonis(matrix ~ family/genus/tax1, strata= samp_data$family, data = samp_data)##
## Call:
## adonis(formula = matrix ~ family/genus/tax1, data = samp_data, strata = samp_data$family)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## family 2 2.7390 1.36952 3.8316 0.14195 0.001 ***
## family:genus 3 1.8726 0.62420 1.7464 0.09705 0.001 ***
## family:genus:tax1 7 3.6037 0.51481 1.4403 0.18676 0.009 **
## Residuals 31 11.0803 0.35743 0.57424
## Total 43 19.2957 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And between species/site , between species/geomorpho, and species/substrata inside families
adonis(matrix ~ tax1/site, strata= samp_data$family, data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1/site, data = samp_data, strata = samp_data$family)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 12 8.2153 0.68461 2.1580 0.42576 0.001 ***
## tax1:site 11 4.7356 0.43051 1.3571 0.24542 0.011 *
## Residuals 20 6.3447 0.31724 0.32882
## Total 43 19.2957 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ tax1/geomorpho, strata= samp_data$family, data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1/geomorpho, data = samp_data, strata = samp_data$family)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 12 8.2153 0.68461 1.9470 0.42576 0.001 ***
## tax1:geomorpho 2 0.8835 0.44175 1.2564 0.04579 0.134
## Residuals 29 10.1968 0.35162 0.52845
## Total 43 19.2957 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ tax1/substrat, strata= samp_data$family, data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1/substrat, data = samp_data, strata = samp_data$family)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 12 8.2153 0.68461 2.0259 0.42576 0.002 **
## tax1:substrat 9 3.6458 0.40509 1.1987 0.18894 0.089 .
## Residuals 22 7.4345 0.33793 0.38530
## Total 43 19.2957 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a significant effect of the taxonomic effect even at family level. To erase this effect, we focused our tests on Scaridae members.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/Scaridae/")
dir.create(dir_data_cleaning, recursive = T)
scaridae_ps <- subset_samples(sey_gut, family == "Scaridae")
scaridae_ps <- prune_taxa(names(which(colSums(scaridae_ps@otu_table)>0)), scaridae_ps)
save(scaridae_ps, file = paste0(dir_data_cleaning, "scaridae_ps.RData"))
scaridaerff <- prune_samples(sample_sums(scaridae_ps) >= min(sample_sums(scaridae_ps)) , scaridae_ps)
scaridaerff <- rarefy_even_depth(scaridae_ps ,min(sample_sums(scaridae_ps)))
save(scaridaerff , file = paste0(dir_data_cleaning, "scaridaerff.RData"))
scar_core <- subset_samples(sey_gut_core, family == "Scaridae")
scar_core <- prune_taxa(names(which(colSums(scar_core@otu_table)>0)), scar_core)
save(scar_core, file = paste0(dir_data_cleaning, "scar_core.RData"))box1 = plot_richness(scaridaerff , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value## [1] 2.1957948 1.4266784 3.2637111 4.5718229 4.4840800 2.6830258 3.3967683
## [8] 3.5768533 0.8209879 2.9114264 3.6864817 2.0008311 4.2034822 0.5042341
## [15] 4.1367648 2.8572218 2.8317726 4.0120393 4.5738187 2.7881697 2.7447567
## [22] 2.5988603
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
palette = c("darkblue","darkred")
sey_scaridae_alpha_div=ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) + geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_scaridae_alpha_div # ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 40.5 0.2710926
## Shannon (ENS) 42.0 0.2259740
scar_core_rel <- transform_sample_counts(scar_core, function(x) x / sum(x) )
save(scar_core_rel , file = paste0(dir_data_cleaning, "scar_core_rel.RData"))
# _____ PCOA coda & permanova ------
matrix <- vegdist(scar_core_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(scar_core_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data , file = paste0(dir_data_cleaning, "samp_data.txt"), sep = "\t", quote = F)
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 21%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 12%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 10%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 9%"
pcoa_scaridae <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = tax1, shape = site), alpha = 0.7, size = 8) +
scale_shape_manual(values = c(15:19))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef", shape = "Site")
pcoa_scaridaeWe will test the effect of the taxonomical levels (genus and species)
adonis(matrix ~ tax1,data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 7 3.7221 0.53173 1.4168 0.41465 0.012 *
## Residuals 14 5.2543 0.37531 0.58535
## Total 21 8.9764 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ genus, data = samp_data)##
## Call:
## adonis(formula = matrix ~ genus, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## genus 2 1.2896 0.64480 1.5938 0.14367 0.006 **
## Residuals 19 7.6868 0.40457 0.85633
## Total 21 8.9764 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because there is a high taxonomic effect, we will “freeze” the genus effect to test the environmental effect (meaning that we test the substrate, geomorpho and site effect inside each genus). Even species have significantly different core microbial composition, We couldn’t test inside species level because we many species were only represented by one individual.
adonis(matrix ~ genus/tax1, strata = samp_data$genus, data = samp_data)##
## Call:
## adonis(formula = matrix ~ genus/tax1, data = samp_data, strata = samp_data$genus)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## genus 2 1.2896 0.64480 1.7181 0.14367 0.099 .
## genus:tax1 5 2.4325 0.48649 1.2962 0.27099 0.099 .
## Residuals 14 5.2543 0.37531 0.58535
## Total 21 8.9764 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ genus/substrat, strata = samp_data$genus, data = samp_data)##
## Call:
## adonis(formula = matrix ~ genus/substrat, data = samp_data, strata = samp_data$genus)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## genus 2 1.2896 0.64480 1.7302 0.14367 0.011 *
## genus:substrat 3 1.7241 0.57470 1.5421 0.19207 0.011 *
## Residuals 16 5.9627 0.37267 0.66426
## Total 21 8.9764 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ genus/site, strata = samp_data$genus, data = samp_data)##
## Call:
## adonis(formula = matrix ~ genus/site, data = samp_data, strata = samp_data$genus)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## genus 2 1.2896 0.64480 1.8528 0.14367 0.001 ***
## genus:site 5 2.8145 0.56289 1.6174 0.31354 0.001 ***
## Residuals 14 4.8723 0.34802 0.54279
## Total 21 8.9764 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Finally, the only herbivorous species which was in both type of reefs was the Scarus ghobban, who eats the epilithic part of the coral, searching for crustalose calcarous algae.
dir_data_cleaning <- paste0(path,"/analyses/04_data_cleaning/Fish/Shift/SCG/")
dir.create(dir_data_cleaning, recursive = T)
SCG <- subset_samples(sey_gut, tax1 == "Scarus ghobban")
SCG <- prune_taxa(names(which(colSums(SCG@otu_table)>0)), SCG)
save(SCG, file = paste0(dir_data_cleaning, "SCG.RData"))
SCGrff <- prune_samples(sample_sums(SCG) >= min(sample_sums(SCG)), SCG)
SCGrff <- rarefy_even_depth(SCGrff, sample.size = min(sample_sums(SCG)))
save(SCGrff, file = paste0(dir_data_cleaning, "SCGrff_ps.RData"))
core_SCG <- subset_samples(sey_gut_core, tax1 == "Scarus ghobban")
core_SCG <- prune_taxa(names(which(colSums(core_SCG@otu_table)>0)), core_SCG)
save(core_SCG, file = paste0(dir_data_cleaning, "core_SCG.RData"))box1 = plot_richness(SCGrff , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_SCG_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")),
map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_SCG_alpha_divMW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 8 0.6285714
## Shannon (ENS) 9 0.4000000
core_SCG_rel <- transform_sample_counts(core_SCG, function(x) x / sum(x) )
save(core_SCG_rel , file = paste0(dir_data_cleaning, "core_SCG_rel.RData"))
# _____ PCOA coda & permanova ------
matrix <- vegdist(core_SCG_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_SCG_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data, file = paste0(dir_data_cleaning, "samp_data.txt"), sep= "\t", quote= F)
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 37%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 24%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 18%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 15%"
pcoa_SCG <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho), alpha = 0.7, size = 8, shape = 16) +
scale_color_manual(values = c("Darkblue","Darkred"))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef")
pcoa_SCGWe test the effect of the reef at host level.
adonis(matrix ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = matrix ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 5039
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.70582 0.70582 1.9293 0.27843 0.106
## Residuals 5 1.82916 0.36583 0.72157
## Total 6 2.53498 1.00000
Even if we can’t test the effect of the reef on the other herbivores family, we can test the site, island and substrat effect on the Siganidae.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/Siganidae/")
dir.create(dir_data_cleaning, recursive = T)
siganidae_ps <- subset_samples(sey_gut, family == "Siganidae")
siganidae_ps <- prune_taxa(names(which(colSums(siganidae_ps@otu_table)>0)), siganidae_ps)
save(siganidae_ps, file = paste0(dir_data_cleaning, "siganidae_ps.RData"))
siganidaerff <- prune_samples(sample_sums(siganidae_ps) >= min(sample_sums(siganidae_ps)) , siganidae_ps)
siganidaerff <- rarefy_even_depth(siganidae_ps ,min(sample_sums(siganidae_ps)))
save(siganidaerff , file = paste0(dir_data_cleaning, "siganidaerff.RData"))
sig_core <- subset_samples(sey_gut_core, family == "Siganidae")
sig_core <- prune_taxa(names(which(colSums(sig_core@otu_table)>0)), sig_core)
save(sig_core, file = paste0(dir_data_cleaning, "sig_core.RData"))sig_core_rel <- transform_sample_counts(sig_core, function(x) x / sum(x) )
save(sig_core_rel , file = paste0(dir_data_cleaning, "sig_core_rel.RData"))
matrix <- vegdist(sig_core_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(sig_core_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data, file = paste0(dir_data_cleaning, "samp_data.txt"), quote = F, sep = "\t")
hull <- cbind(pcoa_coord, samp_data)
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 23%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 15%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 12%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 10%"
pcoa_sig <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = tax1, shape = site),alpha = 0.7, size = 8) +
scale_shape_manual(values = c(18,15,16,17))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Species", shape = "Site")
pcoa_sigadonis(matrix ~ tax1,data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 2 1.1712 0.58561 1.7057 0.19593 0.019 *
## Residuals 14 4.8066 0.34333 0.80407
## Total 16 5.9778 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ island, data = samp_data)##
## Call:
## adonis(formula = matrix ~ island, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## island 1 0.3801 0.38012 1.0186 0.06359 0.393
## Residuals 15 5.5977 0.37318 0.93641
## Total 16 5.9778 1.00000
adonis(matrix ~ site, data = samp_data)##
## Call:
## adonis(formula = matrix ~ site, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## site 3 1.4130 0.47100 1.3413 0.23637 0.084 .
## Residuals 13 4.5648 0.35114 0.76363
## Total 16 5.9778 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ substrat, data = samp_data)##
## Call:
## adonis(formula = matrix ~ substrat, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## substrat 1 0.3801 0.38012 1.0186 0.06359 0.419
## Residuals 15 5.5977 0.37318 0.93641
## Total 16 5.9778 1.00000
Now let’s see the influence of the shift on the carnivores which were more ubiquitous between sites and reef conditions.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/Carnivores/")
dir.create(dir_data_cleaning, recursive = T)We need to subset from the sey_gut phyloseq object the carnivorous fishes.
carn_ps <- subset_samples(sey_gut, diet3 == "Carnivorous")
carn_ps <- prune_taxa(names(which(colSums(carn_ps@otu_table)>0)), carn_ps)
save(carn_ps, file = paste0(dir_data_cleaning, "carn_ps.RData"))
rff_carn <- prune_samples(sample_sums(carn_ps) >= min(sample_sums(carn_ps)) , carn_ps)
rff_carn <- rarefy_even_depth(carn_ps ,min(sample_sums(carn_ps)))
save(rff_carn , file = paste0(dir_data_cleaning, "rff_carn.RData"))
carn_core <- subset_samples(sey_gut_core, diet3 == "Carnivorous")
carn_core <- prune_taxa(names(which(colSums(carn_core@otu_table)>0)), carn_core)
save(carn_core, file = paste0(dir_data_cleaning, "carn_core.RData"))box1 = plot_richness(rff_carn , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_carn_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_carn_alpha_div # ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 326 0.6753113
## Shannon (ENS) 363 0.8252230
core_carn_rel <- transform_sample_counts(carn_core, function(x) x / sum(x) )
save(core_carn_rel , file = paste0(dir_data_cleaning, "core_carn_rel.RData"))
matrix <- vegdist(core_carn_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_carn_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data, file = paste0(dir_data_cleaning, "samp_data.txt"), quote = F, sep = "\t")
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 10%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 9%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 7%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 6%"
pcoa_carn <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho, shape = family), alpha = 0.7, size = 8) +
scale_color_manual(values = palette)+
scale_shape_manual(values =c(15:18, 8))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef", shape= "Family")
pcoa_carnWe tested first the influence of the environment
adonis(matrix ~ island, data = samp_data)##
## Call:
## adonis(formula = matrix ~ island, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## island 1 0.5651 0.56509 1.3071 0.02499 0.097 .
## Residuals 51 22.0484 0.43232 0.97501
## Total 52 22.6135 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = matrix ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.4216 0.42162 0.96895 0.01864 0.5
## Residuals 51 22.1919 0.43513 0.98136
## Total 52 22.6135 1.00000
adonis(matrix ~ site, data = samp_data)##
## Call:
## adonis(formula = matrix ~ site, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## site 6 2.9559 0.49265 1.1528 0.13071 0.061 .
## Residuals 46 19.6576 0.42734 0.86929
## Total 52 22.6135 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ substrat, data = samp_data)##
## Call:
## adonis(formula = matrix ~ substrat, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## substrat 2 0.8248 0.41239 0.94633 0.03647 0.598
## Residuals 50 21.7887 0.43577 0.96353
## Total 52 22.6135 1.00000
And then the taxonomic effect
adonis(matrix ~ family, data = samp_data)##
## Call:
## adonis(formula = matrix ~ family, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## family 4 2.2134 0.55335 1.302 0.09788 0.015 *
## Residuals 48 20.4001 0.42500 0.90212
## Total 52 22.6135 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
adonis(matrix ~ family/genus/tax1 , data =samp_data)##
## Call:
## adonis(formula = matrix ~ family/genus/tax1, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## family 4 2.2134 0.55335 1.3342 0.09788 0.010 **
## family:genus 7 3.1202 0.44574 1.0747 0.13798 0.159
## family:genus:tax1 10 4.4224 0.44224 1.0663 0.19557 0.184
## Residuals 31 12.8575 0.41476 0.56858
## Total 52 22.6135 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/Lutjanidae/")
dir.create(dir_data_cleaning, recursive = T)
lutj_ps <- subset_samples(sey_gut, family == "Lutjanidae")
lutj_ps <- prune_taxa(names(which(colSums(lutj_ps@otu_table)>0)), lutj_ps)
save(lutj_ps, file = paste0(dir_data_cleaning, "lutj_ps.RData"))
rff_lutj <- prune_samples(sample_sums(lutj_ps) >= min(sample_sums(lutj_ps)) , lutj_ps)
rff_lutj <- rarefy_even_depth(lutj_ps ,min(sample_sums(lutj_ps)))
save(rff_lutj , file = paste0(dir_data_cleaning, "rff_lutj.RData"))
core_lutj <- subset_samples(sey_gut_core, family == "Lutjanidae")
core_lutj <- prune_taxa(names(which(colSums(core_lutj@otu_table)>0)), core_lutj)
save(core_lutj, file = paste0(dir_data_cleaning, "core_lutj.RData"))
# __ Alpha diversity ----
box1 = plot_richness(rff_lutj , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_lutj_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_lutj_alpha_div # ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 13 0.1742945
## Shannon (ENS) 16 0.3449883
core_lutj_rel <- transform_sample_counts(core_lutj, function(x) x / sum(x) )
save(core_lutj_rel , file = paste0(dir_data_cleaning, "core_lutj_rel.RData"))
# _____ PCOA coda & permanova ------
library(vegan)
library(ape)
matrix <- vegdist(core_lutj_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_lutj_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data, file = paste0(dir_data_cleaning, "samp_data.txt"), sep = "\t", quote = F)
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 23%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 14%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 11%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 10%"
pcoa_lutj <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho), alpha = 0.7, size = 8, shape = 16) +
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
scale_colour_manual(values= palette)+
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef")
pcoa_lutjThe sampling size for the members of the Lutjanidae family was enough to test several parameters as effect of species, genus, geomorphology of the reef, site and substrat.
adonis(matrix ~ tax1, data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 4 1.8202 0.45505 1.1221 0.33277 0.229
## Residuals 9 3.6496 0.40552 0.66723
## Total 13 5.4698 1.00000
adonis(matrix ~ genus, data = samp_data)##
## Call:
## adonis(formula = matrix ~ genus, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## genus 1 0.4498 0.44982 1.0753 0.08224 0.323
## Residuals 12 5.0200 0.41833 0.91776
## Total 13 5.4698 1.00000
adonis(matrix ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = matrix ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.5446 0.54463 1.327 0.09957 0.134
## Residuals 12 4.9252 0.41043 0.90043
## Total 13 5.4698 1.00000
adonis(matrix ~ site, data = samp_data)##
## Call:
## adonis(formula = matrix ~ site, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## site 4 1.4953 0.37382 0.84648 0.27337 0.829
## Residuals 9 3.9746 0.44162 0.72663
## Total 13 5.4698 1.00000
adonis(matrix ~ substrat, data = samp_data)##
## Call:
## adonis(formula = matrix ~ substrat, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## substrat 2 0.6466 0.32332 0.73738 0.11822 0.897
## Residuals 11 4.8232 0.43847 0.88178
## Total 13 5.4698 1.00000
beta_reef <- betadisper(matrix, samp_data$tax1)
permutest(beta_reef)##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 4 0.52105 0.130262 5.5167 999 0.009 **
## Residuals 9 0.21251 0.023612
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this family, we could test at host level the effect on the species Aprion virescens.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/APV/")
dir.create(dir_data_cleaning, recursive = T)
APV <- subset_samples(sey_gut, tax1 == "Aprion virescens")
APV <- prune_taxa(names(which(colSums(APV@otu_table)>0)), APV)
save(APV, file = paste0(dir_data_cleaning, "APV.RData"))
APVrff <- prune_samples(sample_sums(APV) >= min(sample_sums(APV)), APV)
APVrff <- rarefy_even_depth(APVrff, sample.size = min(sample_sums(APV)))
save(APVrff, file = paste0(dir_data_cleaning, "APVrff_ps.RData"))
core_APV <- subset_samples(sey_gut_core, tax1 == "Aprion virescens")
core_APV <- prune_taxa(names(which(colSums(core_APV@otu_table)>0)),core_APV)
save(core_APV, file = paste0(dir_data_cleaning, "core_APV.RData"))box1 = plot_richness(APVrff , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_APV_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 20, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE, textsize=8, color="black", family = "serif")
sey_APV_alpha_div # ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 4 0.6285714
## Shannon (ENS) 7 0.8571429
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/Lethrinus/")
dir.create(dir_data_cleaning, recursive = T)
sey_leth <- subset_samples(sey_gut, family == "Lethrinidae")
sey_leth <- prune_taxa(names(which(colSums(sey_leth@otu_table)>0)), sey_leth)
save(sey_leth, file = paste0(dir_data_cleaning, "sey_leth.RData"))
rff_leth <- prune_samples(sample_sums(sey_leth) >= min(sample_sums(sey_leth)) , sey_leth)
rff_leth <- rarefy_even_depth(rff_leth ,min(sample_sums(rff_leth)))
save(rff_leth , file = paste0(dir_data_cleaning, "rff_leth.RData"))
core_leth <- subset_samples(sey_gut_core, genus == "Lethrinus")
core_leth <- prune_taxa(names(which(colSums(core_leth@otu_table)>0)),core_leth)
save(core_leth, file = paste0(dir_data_cleaning, "core_leth.RData"))box1 = plot_richness(rff_leth , measures = c("Observed","Shannon") , color = "geomorpho")
box1$data# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_lethr_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_lethr_alpha_div # ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 84 0.9593578
## Shannon (ENS) 83 1.0000000
core_leth_rel <- transform_sample_counts(core_leth, function(x) x / sum(x) )
save(core_leth_rel , file = paste0(dir_data_cleaning, "core_leth_rel.RData"))
# _____ PCOA coda & permanova ------
library(vegan)
library(ape)
matrix <- vegdist(core_leth_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_leth_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data, file = paste0(dir_data_cleaning, "samp_data.txt"), quote = F, sep = "\t")
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 14%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 9%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 8%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 7%"
pcoa_lethr <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho), alpha = 0.7, size = 8, shape = 16) +
scale_color_manual(values = c("Darkblue","Darkred"))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef")
pcoa_lethradonis(matrix ~ tax1 , data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 5 2.2512 0.45024 1.0924 0.21452 0.191
## Residuals 20 8.2430 0.41215 0.78548
## Total 25 10.4943 1.00000
adonis(matrix ~ island, data = samp_data)##
## Call:
## adonis(formula = matrix ~ island, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## island 1 0.4864 0.48645 1.1666 0.04635 0.185
## Residuals 24 10.0078 0.41699 0.95365
## Total 25 10.4943 1.00000
adonis(matrix ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = matrix ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.4896 0.48960 1.1745 0.04665 0.182
## Residuals 24 10.0047 0.41686 0.95335
## Total 25 10.4943 1.00000
adonis(matrix ~ substrat, data = samp_data)##
## Call:
## adonis(formula = matrix ~ substrat, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## substrat 2 0.8541 0.42706 1.0189 0.08139 0.412
## Residuals 23 9.6401 0.41914 0.91861
## Total 25 10.4943 1.00000
adonis(matrix ~ site, data = samp_data)##
## Call:
## adonis(formula = matrix ~ site, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## site 5 2.3167 0.46334 1.1332 0.22076 0.108
## Residuals 20 8.1775 0.40888 0.77924
## Total 25 10.4943 1.00000
Inside this family, we are able to test two different species : Lethrinus mahsena and Lethrinus enigmaticus.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/LEM/")
dir.create(dir_data_cleaning, recursive = T)
LEM <- subset_samples(sey_gut, tax1 == "Lethrinus mahsena")
LEM <- prune_taxa(names(which(colSums(LEM@otu_table)>0)), LEM)
save(LEM, file = paste0(dir_data_cleaning, "LEM.RData"))
LEMrff <- prune_samples(sample_sums(LEM) >= min(sample_sums(LEM)) , LEM)
LEMrff <- rarefy_even_depth(LEMrff ,min(sample_sums(LEMrff)))
save(LEMrff , file = paste0(dir_data_cleaning, "LEMrff.RData"))
core_LEM <- subset_samples(sey_gut_core, tax1 == "Lethrinus mahsena")
core_LEM <- prune_taxa(names(which(colSums(core_LEM@otu_table)>0)), core_LEM)
save(core_LEM, file = paste0(dir_data_cleaning, "core_LEM.RData"))box1 = plot_richness(LEMrff , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_LEM_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_LEM_alpha_div # ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 8 0.6475058
## Shannon (ENS) 9 0.8333333
core_LEM_rel <- transform_sample_counts(core_LEM, function(x) x / sum(x) )
save(core_LEM_rel , file = paste0(dir_data_cleaning, "core_LEM_rel.RData"))
# _____ PCOA coda & permanova ------
library(vegan)
library(ape)
matrix <- vegdist(core_LEM_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_LEM_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 20%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 16%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 13%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 12%"
pcoa_LEM <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho), alpha = 0.7, size = 8, shape = 16) +
scale_color_manual(values = c("darkblue","darkred"))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef")
pcoa_LEMFor this species, we can only test the geomorpho and site effect.
adonis(matrix ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = matrix ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.4824 0.48240 1.1014 0.12102 0.259
## Residuals 8 3.5038 0.43797 0.87898
## Total 9 3.9862 1.00000
adonis(matrix ~ site, data = samp_data)##
## Call:
## adonis(formula = matrix ~ site, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## site 2 0.9778 0.48888 1.1375 0.24529 0.181
## Residuals 7 3.0084 0.42977 0.75471
## Total 9 3.9862 1.00000
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Shift/LEE/")
dir.create(dir_data_cleaning, recursive = T)
LEE <- subset_samples(sey_gut, tax1 == "Lethrinus mahsena")
LEE <- prune_taxa(names(which(colSums(LEE@otu_table)>0)), LEE)
save(LEE, file = paste0(dir_data_cleaning, "LEE.RData"))
LEErff <- prune_samples(sample_sums(LEE) >= min(sample_sums(LEE)) , LEE)
LEErff <- rarefy_even_depth(LEErff ,min(sample_sums(LEErff)))
save(LEErff , file = paste0(dir_data_cleaning, "LEErff.RData"))
core_LEE <- subset_samples(sey_gut_core, tax1 == "Lethrinus mahsena")
core_LEE <- prune_taxa(names(which(colSums(core_LEE@otu_table)>0)), core_LEE)
save(core_LEE, file = paste0(dir_data_cleaning, "core_LEE.RData"))box1 = plot_richness(LEErff , measures = c("Observed","Shannon") , color = "geomorpho")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable)= c("Observed richness","Shannon (ENS)")
levels(box1$data$geomorpho) = c("C" , "M")
library(ggsignif)
library(ggplot2)
palette = c("darkblue","darkred")
sey_LEE_alpha_div= ggplot(box1$data, aes(x = factor(geomorpho) , y = value, color = factor(geomorpho))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) +
scale_colour_manual(values= palette) + theme_bw()+
geom_boxplot(alpha=0.1, outlier.colour = NA) +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 24, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free") +
geom_signif(comparisons = list(c("C", "M")), map_signif_level = TRUE, textsize=5, color="black", family = "serif")
sey_LEE_alpha_div # ____ Test MW ----
library(pgirmess)
# Does exist differences in diversity between types of host?
# ____________ Observed richness MW
MW.obs = wilcox.test(box1$data$value[box1$data$variable=="Observed richness"]~box1$data$geomorpho[box1$data$variable=="Observed richness"])
MW.obs = cbind(MW.obs$statistic, MW.obs$p.value)
# ____________ Shannon number MW
MW.shannon = wilcox.test(box1$data$value[box1$data$variable=="Shannon (ENS)"]~box1$data$geomorpho[box1$data$variable=="Shannon (ENS)"])
MW.shannon =cbind(MW.shannon$statistic, MW.shannon$p.value)
# ____________ All computed
MW.test <- rbind(MW.obs, MW.shannon)
rownames(MW.test) = c("Observed richness","Shannon (ENS)")
colnames(MW.test) = c('W stat' , 'p-value')
MW.test## W stat p-value
## Observed richness 8 0.6666667
## Shannon (ENS) 9 0.8333333
core_LEE_rel <- transform_sample_counts(core_LEE, function(x) x / sum(x) )
save(core_LEE_rel , file = paste0(dir_data_cleaning, "core_LEE_rel.RData"))
# _____ PCOA coda & permanova ------
library(vegan)
library(ape)
matrix <- vegdist(core_LEE_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_LEE_rel))
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 20%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 16%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 13%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 12%"
pcoa_LEE <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = geomorpho), alpha = 0.7, size = 8, shape = 16) +
scale_color_manual(values = c("darkblue","darkred"))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Reef")
pcoa_LEEFor this species, we can only test the geomorpho and site effect.
adonis(matrix ~ geomorpho, data = samp_data)##
## Call:
## adonis(formula = matrix ~ geomorpho, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## geomorpho 1 0.4824 0.48240 1.1014 0.12102 0.308
## Residuals 8 3.5038 0.43797 0.87898
## Total 9 3.9862 1.00000
Finally, it seems that the shift coral - macroalgae doesn’t impact the enteric core microbiome of the reef fish while the core epiphyte is influenced by the environmental parameters as geomorphology, substrat and location. This has already been described in other studies relating on the microbiomes of algae ( Serebryakova et al., 2018). A very weak signal is detected for the herbivores Scarus ghobban and would have maybe been significant with a higher sampling size. Moreover, the observed changes in the diversity and bacterial compositions associated to primary producers with Sargassum and turf algae, which hold a pivotal role as major nutritional resources for Siganidae and Acanthuridae ( Cheal et al., 2010), let us think to further consequences on the enteric microbial compositions of these latter. Nowadays, It is quite admitted that diet is one of the strongest determinant of fish enteric microbiomes. Moreover, some studies in aquaculture have demonstrated that a modification of the food quality markedly influences the variability and diversity of enteric microbiomes ( Butt et al., 2019). Also, as demonstrated in zebrafish hosts, the absence or decreasing availability of food may cause a stress and act as an important factor of the variability of the enteric microbiome even dysbiosis ( Cantas et al., 2012). The resistence of the enteric microbiome against environmental perturbation (modification/absence of the diet because of habitat fragmentation or anthropogenization) has been observed in non human primate in Africa where the species identity remained the strongest determinant ( McCord et al., 2014). In our case, what are thus the stronger determinants that influence the enteric core microbiome?
The diet and the taxonomy are known to be major drivers of the microbial assemblage in the gut of fishes. We have seen previously some effect at different scale. Here, we will describe and determine which are the differences between the diversity of the differnt types of diet and families and try to detect some biomarkers which could play a functional role in the nutrition of the reef fish.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Core/A_div/")
dir.create(dir_data_cleaning)Let’s see at granulous scale of diet :Herbivorous vs carnivorous with the 3 omnivorous fishes considered as herbivorous.
# ____ Shannon (ENS), Faith PD and LCBD ----
levels(sample_data(seyrff_gut)$diet3)[3] = "Herbivorous"
box1 = plot_richness(seyrff_gut , measures = c("Observed","Shannon") , color = "diet3")
# ____ Shannon index (ENS)
box1$data[box1$data$variable == "Shannon",]$value = exp(box1$data[box1$data$variable == "Shannon",]$value)
levels(box1$data$variable) = c("Observed richness", "Shannon (ENS)")
library(ggsignif)
library(ggplot2)
palette = c("darkred" , "darkgreen")
sey_alpha_div_diet = ggplot(box1$data, aes(x = factor(diet3) , y = value, color = factor(diet3))) +
geom_jitter(position = position_jitter(width = .20), alpha = 0.5, size = 3) + theme_bw() +
geom_boxplot(alpha=0.1, outlier.colour = NA) +
scale_color_manual(values = palette)+
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text = element_text(family = "serif",size = 14),
axis.text.x = element_text(family = "serif",size = 15, angle = 0),
axis.text.y = element_text(family = "serif",size = 20, angle = 0),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none",
strip.text.x = element_text(size=15, family = "serif")) +
geom_signif(comparisons = list(c("Carnivorous", "Herbivorous")), map_signif_level = TRUE, textsize=5, color="black", family = "serif")+
facet_wrap( ~ variable, nrow=1, ncol=5, scales = "free")
sey_alpha_div_diet# ____ Dunn test ----
library(pgirmess)
library(dunn.test)
# Does exist differences in diversity between diets?
# ____________ Observed dunn
dunn.test(box1$data$value[box1$data$variable=="Observed richness"],box1$data$diet1[box1$data$variable=="Observed richness"], method = "bonferroni")## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 13.507, df = 4, p-value = 0.01
##
##
## Comparison of x by group
## (Bonferroni)
## Col Mean-|
## Row Mean | FC H HD MI
## ---------+--------------------------------------------
## H | -3.125502
## | 0.0089*
## |
## HD | -2.664975 -0.559770
## | 0.0385 1.0000
## |
## MI | -2.415064 1.242383 1.197258
## | 0.0787 1.0000 1.0000
## |
## OM | -2.591145 -1.194984 -0.760599 -1.579624
## | 0.0478 1.0000 1.0000 0.5710
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
# ____________ Shannon number dunn
dunn.test(box1$data$value[box1$data$variable=="Shannon (ENS)"],box1$data$diet1[box1$data$variable=="Shannon (ENS)"], method = "bonferroni")## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 13.0985, df = 4, p-value = 0.01
##
##
## Comparison of x by group
## (Bonferroni)
## Col Mean-|
## Row Mean | FC H HD MI
## ---------+--------------------------------------------
## H | -3.240358
## | 0.0060*
## |
## HD | -2.271655 0.009037
## | 0.1155 1.0000
## |
## MI | -2.377953 1.495953 0.752121
## | 0.0870 0.6733 1.0000
## |
## OM | -2.474232 -1.010313 -0.902549 -1.471990
## | 0.0668 1.0000 1.0000 0.7051
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Core/B_div/")
dir.create(dir_data_cleaning)
core_gut_rel <- transform_sample_counts(sey_gut_core, function(x) x / sum(x) )
save(core_gut_rel , file = paste0(dir_data_cleaning, "core_gut_rel.RData"))
# _____ PCOA coda & permanova ------
library(vegan)
library(ape)
matrix <- vegdist(core_gut_rel@otu_table, method = "bray")
save(matrix, file = paste0(dir_data_cleaning, "beta_matrices.RData"))
pcoa.sub <- pcoa(matrix)
save(pcoa.sub, file = paste0(dir_data_cleaning, "pcoa.values.RData"))
# Contruction of the table for graphic
pcoa_coord <- pcoa.sub$vectors[,1:3]
# _____ PCOA Plot-----
library(stringr)
samp_data <- data.frame(sample_data(core_gut_rel))
levels(samp_data$diet3) <- c("Carnivorous", "Herbivorous","Herbivorous")
save(samp_data,file = paste0(dir_data_cleaning, "samp_data.RData"))
#write.table(samp_data, file = paste0(dir_data_cleaning, "samp_data.txt"), quote = F, sep = "\t")
hull <- cbind(pcoa_coord, samp_data)
hull$geomorpho <- str_replace_all(hull$geomorpho, c("macroalgal"= "M" , "coral"="C"))
# What is the percentage of the explicative variance?
paste("Axis 1 :",percent(pcoa.sub$values$Relative_eig[1])) ## [1] "Axis 1 : 9%"
paste("Axis 2 :",percent(pcoa.sub$values$Relative_eig[2])) ## [1] "Axis 2 : 7%"
paste("Axis 3 :",percent(pcoa.sub$values$Relative_eig[3]))## [1] "Axis 3 : 6%"
paste("Axis 4 :",percent(pcoa.sub$values$Relative_eig[4]))## [1] "Axis 4 : 6%"
pcoa_diet1 <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = diet1), alpha = 0.7, size = 8) +
scale_color_manual(values = c("Darkred", "Darkgreen","Darkkhaki","Darkorange","Darkorchid"))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Diet")
pcoa_diet1pcoa_diet3 <- ggplot() +
geom_hline(yintercept=0, colour="lightgrey", linetype = 2) +
geom_vline(xintercept=0, colour="lightgrey", linetype = 2) +
geom_point(data = hull, aes(x=Axis.1, y=Axis.2, color = diet3), alpha = 0.7, size = 8) +
scale_color_manual(values = c("Darkred", "Darkgreen"))+
xlab(paste("PCo1 (", round(pcoa.sub$values$Relative_eig[1]*100, 1), "%)")) +
ylab(paste("PCo2 (", round(pcoa.sub$values$Relative_eig[2]*100, 1), "%)")) +
theme_bw() +
coord_equal() +
theme(axis.title.x = element_text(size=20, family = "serif"), # remove x-axis labels
axis.title.y = element_text(size=20 , family = "serif"),
axis.text.x = element_text(size = 18, family = "serif"),
axis.text.y = element_text(size = 18, family = "serif"),
legend.text = element_text(size = 16, family = "serif"),
legend.title = element_text(size = 16,family = "serif"), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank(),
axis.title = element_text(size = 25))+
labs(colour = "Diet")
pcoa_diet3The herbivores and carnivores seem to compose two different clusters well defined. Let’s test the influence of the diet:
# Between diets
adonis(matrix ~ diet1, data = samp_data)##
## Call:
## adonis(formula = matrix ~ diet1, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## diet1 4 3.479 0.86983 2.0004 0.07845 0.001 ***
## Residuals 94 40.874 0.43483 0.92155
## Total 98 44.353 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The taxonomy can also play a role in the determination of the microbial composition, so what is the effect of the taxonomy?
# Between families
adonis(matrix ~ tax1, data = samp_data)##
## Call:
## adonis(formula = matrix ~ tax1, data = samp_data)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## tax1 35 19.986 0.57104 1.4764 0.45062 0.001 ***
## Residuals 63 24.367 0.38677 0.54938
## Total 98 44.353 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To test the effect of the diet inside a taxonomic level, we freeze the diet and see what is the effect of the family, and inside the family, the effect of the genus, and inside a genus, the effect of a species. Actually, we tested the effect between species and genus, sharing the same family, with the effect of the diet blocked (in case of different diet inside family which can occur between piscivorous and invertivores in the Lutjanidae family).
adonis(matrix ~ diet3/family / genus / tax1, strata = samp_data$diet3, data = samp_data) ##
## Call:
## adonis(formula = matrix ~ diet3/family/genus/tax1, data = samp_data, strata = samp_data$diet3)
##
## Blocks: strata
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## diet3 1 1.613 1.61348 4.1716 0.03638 0.001 ***
## diet3:family 7 5.354 0.76487 1.9776 0.12071 0.001 ***
## diet3:family:genus 10 4.993 0.49928 1.2909 0.11257 0.001 ***
## diet3:family:genus:tax1 17 8.026 0.47212 1.2207 0.18096 0.002 **
## Residuals 63 24.367 0.38677 0.54938
## Total 98 44.353 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To detect now biomarkers in herbivores and carnivores, we apply an discriminant analysis with LEfsE analysis.
Proceed the “lefse_type_sey.txt” in a LEfse Galaxy and retrieve data “LDA Effect Size” with the associated “Plot LEfSe Results” in the LDA folder.You have to rename the former file “LDA_results” and run the following code.
library(plyr)
LDA_Effect_Size <- read.delim(paste0(dir_data_cleaning, "LDA_results"), header=FALSE)
LDA_carn <- subset(LDA_Effect_Size , LDA_Effect_Size$V3 == "Carnivorous")
LDA_herb <- subset(LDA_Effect_Size , LDA_Effect_Size$V3 == "Herbivorous")
LDA_diet <- rbind(LDA_carn,LDA_herb)
LDA_phylum <- LDA_diet$V1
names(LDA_phylum) <- as.factor(".Phylum.Class.Order.Family.Genus") #paste the names(LDA_phylum) in excel
#write.table(LDA_phylum, file=paste0(dir_data_cleaning, "LDA_phylum.txt"), sep="\t", row.names=F, col.names=F, quote=FALSE)
library(readr)
LDA_phylum <- read_delim(paste0(dir_data_cleaning, "LDA_phylum.txt"),
".", escape_double = FALSE, col_types = cols(X1 = col_skip()),
trim_ws = TRUE)
LDA_phylum_diet <- cbind(LDA_phylum , LDA_diet[,2:5])
colnames(LDA_phylum_diet) <- c(names(LDA_phylum)[1:5] , "LDA_res" , "Diet" , "LDA_res2" , "sig")
write.table(LDA_phylum_diet, file=paste0(dir_data_cleaning, "LDA_phylum_diet.txt"), sep="\t", row.names=F, col.names=T, quote=FALSE)
# Select your significativ value and taxa
LEFSE_sig <- subset(LDA_phylum_diet, LDA_phylum_diet$LDA_res2 >= 3)
LEFSE_sig_genus <- subset(LEFSE_sig, LEFSE_sig$Genus %in% as.character(na.exclude(LEFSE_sig$Genus)))
LEFSE_sig_genus_vec <- paste(LEFSE_sig_genus$Phylum, LEFSE_sig_genus$Class, LEFSE_sig_genus$Order, LEFSE_sig_genus$Family, LEFSE_sig_genus$Genus, sep = "|")
biomarkers <- paste(LEFSE_sig_genus$Order, LEFSE_sig_genus$Genus, sep = "|")
save(biomarkers, file = paste0(dir_data_cleaning, "biomarkers.RData"))
sey_samp <- sey_gut.genus@sam_data
write.table(sey_samp, file=paste0(dir_data_cleaning, "sey_samp.txt"), sep="\t", row.names=F, col.names=T, quote=FALSE)
levels(sample_data(sey_gut.genus)$diet3) <- c("Carnivorous", "Herbivorous","Herbivorous")
physeq_phylum <- sey_gut.genus %>%
transform_sample_counts(function(x) {x/sum(x)} ) %>% # Transform to rel. abundance
psmelt() %>%
filter(Abundance > 0) %>% # Melt to long format
arrange(Genus)
physeq_phylum$Abundance <- physeq_phylum$Abundance/nrow(sample_data(sey_gut.genus))
physeq_phylum$tax <- paste(physeq_phylum$Phylum, physeq_phylum$Class, physeq_phylum$Order, physeq_phylum$Family, physeq_phylum$Genus, sep = "|")
save(physeq_phylum, file= paste0(dir_data_cleaning, "physeq_phylum.RData"))
#write.table(physeq_phylum ,file = paste0(dir_data_cleaning, "physeq_phylum.txt"), sep="\t",quote = F)
Diet_Pol_sey <- physeq_phylum
Diet_Pol_sey$tax <- paste(Diet_Pol_sey$Order, Diet_Pol_sey$Genus, sep = "|")
length(physeq_phylum$tax)## [1] 945
df_sey <- Diet_Pol_sey[,c("diet3","tax3","tax","Abundance")]
colnames(df_sey) <- c("family","item","score","value")
save(df_sey, file = paste0(dir_data_cleaning, "df_sey.RData"))Let’s organize our data to plot corresponding colours depending on the compartment.
dir_data_cleaning <- paste0(path, "/analyses/04_data_cleaning/Fish/Core/LDA/")
load(paste0(dir_data_cleaning, "biomarkers.RData"))
load(paste0(dir_data_cleaning, "df_sey.RData"))
library(plyr)
print_biomarkers <- levels(factor(df_sey[df_sey$score %in% biomarkers,]$score)) # All biomarkers present in the sey.diet_genus RData file
others <- levels(factor(df_sey[!df_sey$score %in% biomarkers,]$score)) # All taxa which are not biomarkers
df_sey[!df_sey$score %in% biomarkers,]$score <- "Z-Other" # Call them other (Z to figure at the end of the list)
# Some biomarkers are in common so Output the frequences.
freq_herb <- table(df_sey[df_sey$family == "Herbivorous",]$score)/length(df_sey[df_sey$family == "Herbivorous",]$score)*100
freq_herb <- freq_herb[!names(freq_herb) == "Z-Other"]
freq_carn <- table(df_sey[df_sey$family == "Carnivorous",]$score)/length(df_sey[df_sey$family == "Carnivorous",]$score)*100
freq_carn <- freq_carn[!names(freq_carn) == "Z-Other"]
# Keep the common biomarkers of the herbivorous and carnivorous and the respective frequencies in each compartment
biom_common <- names(freq_herb[which(names(freq_herb) %in% names(freq_carn))])
freq_herb_common <- as.data.frame(freq_herb[names(freq_herb) %in% biom_common])
freq_carn_common <- as.data.frame(freq_carn[names(freq_carn) %in% biom_common])
freq_common <- cbind(freq_herb_common, freq_carn_common$Freq)
colnames(freq_common) <- c("Biomarkers", "Freq_herb" ,"Freq_carn")
# Class the biomarkers in the compartment where the value is the higher
biom_herb_to_keep <- freq_common$Biomarkers[which(freq_common$Freq_herb > freq_common$Freq_carn, T)]
biom_carn_to_keep <- freq_common$Biomarkers[which(freq_common$Freq_carn > freq_common$Freq_herb, T)]
# See which are the biomarkers for the herbivorous and which are the biomarkers for the carnivorous
biom_herb <- levels(factor(df_sey[df_sey$family == "Herbivorous",]$score))
biom_herb <- biom_herb[!biom_herb == "Z-Other"]
biom_herb <- biom_herb[!biom_herb %in% biom_carn_to_keep]
biom_carn <- levels(factor(df_sey[df_sey$family == "Carnivorous",]$score))
biom_carn <- biom_carn[!biom_carn == "Z-Other"]
biom_carn <- biom_carn[!biom_carn %in% biom_herb_to_keep]
# Color choice
herb_colors <- c("yellow" , "darkkhaki", "darkolivegreen",
"darkolivegreen2","gold4","chartreuse",
"aquamarine","darkgreen", "darkcyan",
"darkseagreen", "gold3", "darkslategray")
pie(rep(1, 12), col= herb_colors)carn_colors <- c("darkred","darksalmon","darkorchid4",
"blueviolet","darkblue",
"blue3", "brown1","deeppink")
pie(rep(1, 8), col= carn_colors)polar_col <- c(carn_colors , herb_colors , 'Z_Other'="black")
# Add a prefix H (for herbivorous) or C (for carnivorous) in their respectiv biomarkers. In this way, they will be listed following
# their respectiv compartment and no their names (easier for the add of the color in the ggplot)
df_sey[df_sey$score %in% biom_herb,]$score <- paste("H",df_sey[df_sey$score %in% biom_herb,]$score, sep= "-")
df_sey[df_sey$score %in% biom_carn,]$score <- paste("C",df_sey[df_sey$score %in% biom_carn,]$score, sep= "-")source(paste0(dir_refdb, "polarHistogram.R"))
p <- polarHistogram(df_sey , familyLabel=T, innerRadius = 0.2, spaceFamily =7, circleProportion = 0.85)
p + ggplot2::scale_fill_manual(values= polar_col) +
theme(legend.text = element_text(family = "serif", size = 3)) +
theme(legend.position= "none") # to hide the legend because of the size